Global ETD Search

1	Describing Trail Cultures through Studying Trail Stakeholders and Analyzing their Tweets Bartolome, Abigail Joy 08 August 2018 (has links) While many people enjoy hiking as a weekend activity, to many outdoor enthusiasts there is a hiking culture with which they feel affiliated. However, the way that these cultures interact with each other is still unclear. Exploring these different cultures and understanding how they relate to each other can help in engaging stakeholders of the trail. This is an important step toward finding ways to encourage environmentally friendly outdoor recreation practices and developing hiker-approved (and environmentally conscious) technologies to use on the trail. We explored these cultures by analyzing an extensive collection of tweets (over 1.5 million). We used topic modeling to identify the topics described by the communities of Triple Crown trails. We labeled training data for a classifier that identifies tweets relating to depreciative behaviors on the trail. Then, we compared the distribution of tweets across various depreciative trail behaviors to those of corresponding blog posts in order to see how tweets reflected cultures in comparison with blog posts. To harness metadata beyond the text of the tweets, we experimented with visualization techniques. We combined those efforts with ethnographic studies of hikers and conservancy organizations to produce this exploration of trail cultures. In this thesis, we show that through the use of natural language processing, we can identify cultural differences between trail communities. We identify the most significantly discussed forms of trail depreciation, which is helpful to conservation organizations so that they can more appropriately share which Leave No Trace practices hikers should place extra effort into practicing. / Master of Science / In a memoir of her hike on the Pacific Crest Trail, Wild, Cheryl Strayed said to a reporter in an amused tone, “I’m not a hobo, I’m a long-distance hiker”. While many people enjoy hiking as a weekend activity, to many outdoor enthusiasts there is a hiking culture with which they feel affiliated. There are cultures of trail conservation, and cultures of trail depreciation. There are cultures of long-distance hiking, and there are cultures of day hiking and weekend warrior hiking. There are also cultures across different hiking trails—where the hikers of one trail have different sets of values and behaviors than for another trail. However, the way that these cultures interact with each other is still unclear. Exploring these different cultures and understanding how they relate to each other can help in engaging stakeholders of the trail. This is an important step toward finding ways to encourage environmentally friendly outdoor recreation practices and developing hiker-approved (and environmentally conscious) technologies to use on the trail. We decided to explore these cultures by analyzing an extensive collection of tweets (over 1.5 million). We combined those expoorts with ethnographic style studies of conservancy organizations and avid hikers to produce this exploration of trail cultures. Hikers Natural language processing Topic analysis Twitter Technology on the trail
2	Who, what and when: how media and politicians shape the Brazilian debate on foreign affairs / Quem, o que e quando: como a mídia e os políticos moldam o debate sobre política externa no Brasil Hardt, Matheus Soldi 10 July 2019 (has links) What do politicians talk about when discussing foreign affairs? Are these topics different from the ones in the newspapers? Finally, can unsupervised methods be used to help us understand these problems? Answering these questions is of paramount importance to understanding the relationship between foreign policy and mass media. Based on this discussion, this research has three main objectives: (a) to verify whether unsupervised methods can be used to analyze documents on international issues; (b) to understand the issues that politicians talk about when dealing with foreign affairs; and (c) to understand when and with which periodicity the mass media publish news on certain international topics. To do so, I created two new corpora, one with news articles published in the international section of two major Brazilian newspapers; and a corpus with all speeches made within the two Committees on Foreign Affairs of the National Congress of Brazil. I ran a topic model using Latent Dirichlet Allocation (LDA) in both. The results of this topic model show that LDA can be used to distinguish different international issues that appear in both political discourse and the mass media in Brazil. Additionally, I found that the LDA model can be used to identify when some topics are debated and for how long. The findings also demonstrate that Brazilian politicians and Brazilian newspapers are neither isolated nor unstable in what regards international issues. / Sobre o que os políticos falam quando discutem temas internacionais? Esses tópicos são diferentes daqueles que aparecem nos jornais? Finalmente, métodos não supervisionados podem ser usados para nos ajudar a entender esses problemas? Responder a essas perguntas é de suma importância para entender a relação entre política externa e mídia de massa. Com base nessa discussão, esta pesquisa tem três objetivos principais: (a) verificar se os métodos não supervisionados podem ser usados para analisar documentos sobre questões internacionais; (b) compreender sobre que assuntos os políticos falam quando lidam com relações exteriores; e (c) entender quando e por quanto tempo a mídia de massa publica notícias sobre determinados tópicos internacionais. Para tanto, eu criei dois novos corpora, um com notícias publicadas no caderno internacional de dois dos principais jornais brasileiros; e um corpus com todos os discursos feitos dentro das duas Comissões de Relações Exteriores do Congresso Brasileiro. Executei um modelo de tópico usando Latent Dirichlet Allocation (LDA) em ambos. Os resultados desse modelo de tópico mostram que ele pode ser usado para distinguir diferentes questões internacionais que aparecem tanto no discurso político como na mídia de massa no Brasil. Além disso, o modelo pode ser usado para identificar quando alguns tópicos são debatidos e por quanto tempo. Os resultados também demonstram que tanto os políticos como os jornais brasileiros não são isolados nem instáveis em relação a questões internacionais. Análise de tópicos Brasil Brazil Discursos políticos Jornais LDA LDA Newspapers Political speeches Topic analysis
3	Personalized Document Recommendation by Latent Dirichlet Allocation Chen, Li-Zen 13 August 2012 (has links) Accompanying with the rapid growth of Internet, people around the world can easily distribute, browse, and share as much information as possible through the Internet. The enormous amount of information, however, causes the information overload problem that is beyond users¡¦ limited information processing ability. Therefore, recommender systems arise to help users to look for useful information when they cannot describe the requirements precisely. The filtering techniques in recommender systems can be divided into content-based filtering (CBF) and collaborative filtering (CF). Although CF is shown to be superior over CBF in literature, personalized document recommendation relies more on CBF simply because of its text content in nature. Nevertheless, document recommendation task provides a good chance to integrate both techniques into a hybrid one, and enhance the overall recommendation performance. The objective of this research is thus to propose a hybrid filtering approach for personalized document recommendation. Particularly, latent Dirichlet allocation to uncover latent semantic structure in documents is incorporated to help us to either obtain robust document similarity in CF, or explore user profiles in CBF. Two experiments are conducted accordingly. The results show that our proposed approach outperforms other counterparts on the recommendation performance, which justifies the feasibility of our proposed approach in real applications. recommender systems collaborative filtering hidden topic analysis latent Dirichlet allocation content-based filtering
4	Value Creation From User Generated Content for Smart Tourism Destinations Celen, Mustafa, Rojas, Maximiliano January 2020 (has links) This paper aims to show how User Generated Content can create value for Smart Tourism Destinations. Applying the analysis on 5 different cases in the region of Stockholm to derive patterns and opportunities of value creation generated by UGC in tourism. Findings of this paper is also discussed in terms of improving decision making, possibilities of new business models and importance of technological improvements on STD’s. Finally, thoughts on models are presented for researchers and practitioners that might be interested in exploitation of UGC in the context of information-intensive industries and mainly in Tourism. Goggle Trends Tripadvisor Smart Tourism Destinations User Generated Content NLP Text Mining Topic Analysis Sentiment Analysis Social Sciences Samhällsvetenskap
5	Dialogen i socialt arbete : en studie av socialbyråsamtal i ljuset av modern dialogteori / On the Nature of Dialogue in Social Work : A study or verbal encounters at the social welfare office in the light of modern dialogue theory Fredin, Erik January 1993 (has links) This dissertation concerns the dialogical encounters that take place between social workcr and client within the social services. The empirical material consists of 21 dialogues, 19 of which wcre recorded at two social welfare offices, one located in the center of Stockholm and the other in a suburb, and two which were recorded at the youth counseling office. In all, the material comprises 5150 individual utterances. In the literature, the type of dialogical cncounters studicd is called "institutional discourse". The theoretical approach in the analysis is based on an interactionist perspective, particularly with respect to dialogical analysis of verbal encounters. The aim of thc study is to analyse the dialogue between social worker and client from three points of de parture: participallt structure, topic, and inherent perspectives. In studying the discourse, the ini tiative-response (IR-) method of analysis was applied. The IR-method focuses on the dynamics of the dialogue, its pattern of dominance and its coherence. The results show that the social worker was interactively dominant in about two-thirds of the dialogues studied, but this dominance was most forcefully evident with respect to soliciting inititatives. With respect to content, the study concerns how the topical structure of the dialogue is constructed, i.e. which topics are taken up in the discourse and which of the parties steers the transition between topics. In the case of transitions to a topic defined as the main issue of the dialogue, it was the social worker who, in four-fifths of the cases, strategically steered the course of the dialogue. The perspective analysis undertaken in the study is concerned with investigating how different topics are treated in the dialogues, i.e. which of the parties "establishes" the perspectives and which "submits" to them . This analysis, made on a selection of seven of the 21 dialogical encounters, shows that it was the social worker who, in about two-thirds of thc cases, established by means of various interactive moves a bureaucratic or social welfare perspective on thc main topic sequences. Lastly, discursive dominance and power are discussed in the light of the prececling analyses. Dialogue dialogue theory discursive dominance initiative-response analysis institutional discourse perspective analysis power relation social work topic analysis utterance Socialt arbete Samtalsmetodik
6	AIM - A Social Media Monitoring System for Quality Engineering Bank, Mathias 27 June 2013 (has links) (PDF) In the last few years the World Wide Web has dramatically changed the way people are communicating with each other. The growing availability of Social Media Systems like Internet fora, weblogs and social networks ensure that the Internet is today, what it was originally designed for: A technical platform in which all users are able to interact with each other. Nowadays, there are billions of user comments available discussing all aspects of life and the data source is still growing. This thesis investigates, whether it is possible to use this growing amount of freely provided user comments to extract quality related information. The concept is based on the observation that customers are not only posting marketing relevant information. They also publish product oriented content including positive and negative experiences. It is assumed that this information represents a valuable data source for quality analyses: The original voices of the customers promise to specify a more exact and more concrete definition of \"quality\" than the one that is available to manufacturers or market researchers today. However, the huge amount of unstructured user comments makes their evaluation very complex. It is impossible for an analysis protagonist to manually investigate the provided customer feedback. Therefore, Social Media specific algorithms have to be developed to collect, pre-process and finally analyze the data. This has been done by the Social Media monitoring system AIM (Automotive Internet Mining) that is the subject of this thesis. It investigates how manufacturers, products, product features and related opinions are discussed in order to estimate the overall product quality from the customers\\\' point of view. AIM is able to track different types of data sources using a flexible multi-agent based crawler architecture. In contrast to classical web crawlers, the multi-agent based crawler supports individual crawling policies to minimize the download of irrelevant web pages. In addition, an unsupervised wrapper induction algorithm is introduced to automatically generate content extraction parameters which are specific for the crawled Social Media systems. The extracted user comments are analyzed by different content analysis algorithms to gain a deeper insight into the discussed topics and opinions. Hereby, three different topic types are supported depending on the analysis needs. * The creation of highly reliable analysis results is realized by using a special context-aware taxonomy-based classification system. * Fast ad-hoc analyses are applied on top of classical fulltext search capabilities. * Finally, AIM supports the detection of blind-spots by using a new fuzzified hierarchical clustering algorithm. It generates topical clusters while supporting multiple topics within each user comment. All three topic types are treated in a unified way to enable an analysis protagonist to apply all methods simultaneously and in exchange. The systematically processed user comments are visualized within an easy and flexible interactive analysis frontend. Special abstraction techniques support the investigation of thousands of user comments with minimal time efforts. Hereby, specifically created indices show the relevancy and customer satisfaction of a given topic. / In den letzten Jahren hat sich das World Wide Web dramatisch verändert. War es vor einigen Jahren noch primär eine Informationsquelle, in der ein kleiner Anteil der Nutzer Inhalte veröffentlichen konnte, so hat sich daraus eine Kommunikationsplattform entwickelt, in der jeder Nutzer aktiv teilnehmen kann. Die dadurch enstehende Datenmenge behandelt jeden Aspekt des täglichen Lebens. So auch Qualitätsthemen. Die Analyse der Daten verspricht Qualitätssicherungsmaßnahmen deutlich zu verbessern. Es können dadurch Themen behandelt werden, die mit klassischen Sensoren schwer zu messen sind. Die systematische und reproduzierbare Analyse von benutzergenerierten Daten erfordert jedoch die Anpassung bestehender Tools sowie die Entwicklung neuer Social-Media spezifischer Algorithmen. Diese Arbeit schafft hierfür ein völlig neues Social Media Monitoring-System, mit dessen Hilfe ein Analyst tausende Benutzerbeiträge mit minimaler Zeitanforderung analysieren kann. Die Anwendung des Systems hat einige Vorteile aufgezeigt, die es ermöglichen, die kundengetriebene Definition von \"Qualität\" zu erkennen. Social-Media Qualität Crawler Cluster Fuzzy-Cluster Themen-Analyse NLP Monitoring Social Media Quality Crawler Topic Analysis Cluster Fuzzy Clustering Monitoring ddc:500 Social Media
7	Automatisierte Verfahren für die Themenanalyse nachrichtenorientierter Textquellen Niekler, Andreas 20 January 2016 (has links) (PDF) Im Bereich der medienwissenschaftlichen Inhaltsanalyse stellt die Themenanalyse einen wichtigen Bestandteil dar. Für die Analyse großer digitaler Textbestände hin- sichtlich thematischer Strukturen ist es deshalb wichtig, das Potential automatisierter computergestützter Methoden zu untersuchen. Dabei müssen die methodischen und analytischen Anforderungen der Inhaltsanalyse beachtet und abgebildet werden, wel- che auch für die Themenanalyse gelten. In dieser Arbeit werden die Möglichkeiten der Automatisierung der Themenanalyse und deren Anwendungsperspektiven untersucht. Dabei wird auf theoretische und methodische Grundlagen der Inhaltsanalyse und auf linguistische Theorien zu Themenstrukturen zurückgegriffen,um Anforderungen an ei- ne automatische Analyse abzuleiten. Den wesentlichen Beitrag stellt die Untersuchung der Potentiale und Werkzeuge aus den Bereichen des Data- und Text-Mining dar, die für die inhaltsanalytische Arbeit in Textdatenbanken hilfreich und gewinnbringend eingesetzt werden können. Weiterhin wird eine exemplarische Analyse durchgeführt, um die Anwendbarkeit automatischer Methoden für Themenanalysen zu zeigen. Die Arbeit demonstriert auch Möglichkeiten der Nutzung interaktiver Oberflächen, formu- liert die Idee und Umsetzung einer geeigneten Software und zeigt die Anwendung eines möglichen Arbeitsablaufs für die Themenanalyse auf. Die Darstellung der Potentiale automatisierter Themenuntersuchungen in großen digitalen Textkollektionen in dieser Arbeit leistet einen Beitrag zur Erforschung der automatisierten Inhaltsanalyse. Ausgehend von den Anforderungen, die an eine Themenanalyse gestellt werden, zeigt diese Arbeit, mit welchen Methoden und Automatismen des Text-Mining diesen Anforderungen nahe gekommen werden kann. Zusammenfassend sind zwei Anforde- rungen herauszuheben, deren jeweilige Erfüllung die andere beeinflusst. Zum einen ist eine schnelle thematische Erfassung der Themen in einer komplexen Dokument- sammlung gefordert, um deren inhaltliche Struktur abzubilden und um Themen kontrastieren zu können. Zum anderen müssen die Themen in einem ausreichenden Detailgrad abbildbar sein, sodass eine Analyse des Sinns und der Bedeutung der The- meninhalte möglich ist. Beide Ansätze haben eine methodische Verankerung in den quantitativen und qualitativen Ansätzen der Inhaltsanalyse. Die Arbeit diskutiert diese Parallelen und setzt automatische Verfahren und Algorithmen mit den Anforde- rungen in Beziehung. Es können Methoden aufgezeigt werden, die eine semantische und damit thematische Trennung der Daten erlauben und einen abstrahierten Über- blick über große Dokumentmengen schaffen. Dies sind Verfahren wie Topic-Modelle oder clusternde Verfahren. Mit Hilfe dieser Algorithmen ist es möglich, thematisch kohärente Untermengen in Dokumentkollektion zu erzeugen und deren thematischen Gehalt für Zusammenfassungen bereitzustellen. Es wird gezeigt, dass die Themen trotz der distanzierten Betrachtung unterscheidbar sind und deren Häufigkeiten und Verteilungen in einer Textkollektion diachron dargestellt werden können. Diese Auf- bereitung der Daten erlaubt die Analyse von thematischen Trends oder die Selektion bestimmter thematischer Aspekte aus einer Fülle von Dokumenten. Diachrone Be- trachtungen thematisch kohärenter Dokumentmengen werden dadurch möglich und die temporären Häufigkeiten von Themen können analysiert werden. Für die detaillier- te Interpretation und Zusammenfassung von Themen müssen weitere Darstellungen und Informationen aus den Inhalten zu den Themen erstellt werden. Es kann gezeigt werden, dass Bedeutungen, Aussagen und Kontexte über eine Kookurrenzanalyse im Themenkontext stehender Dokumente sichtbar gemacht werden können. In einer Anwendungsform, welche die Leserichtung und Wortarten beachtet, können häufig auftretende Wortfolgen oder Aussagen innerhalb einer Thematisierung statistisch erfasst werden. Die so generierten Phrasen können zur Definition von Kategorien eingesetzt werden oder mit anderen Themen, Publikationen oder theoretischen An- nahmen kontrastiert werden. Zudem sind diachrone Analysen einzelner Wörter, von Wortgruppen oder von Eigennamen in einem Thema geeignet, um Themenphasen, Schlüsselbegriffe oder Nachrichtenfaktoren zu identifizieren. Die so gewonnenen Infor- mationen können mit einem „close-reading“ thematisch relevanter Dokumente ergänzt werden, was durch die thematische Trennung der Dokumentmengen möglich ist. Über diese methodischen Perspektiven hinaus lassen sich die automatisierten Analysen als empirische Messinstrumente im Kontext weiterer hier nicht besprochener kommu- nikationswissenschaftlicher Theorien einsetzen. Des Weiteren zeigt die Arbeit, dass grafische Oberflächen und Software-Frameworks für die Bearbeitung von automatisier- ten Themenanalysen realisierbar und praktikabel einsetzbar sind. Insofern zeigen die Ausführungen, wie die besprochenen Lösungen und Ansätze in die Praxis überführt werden können. Wesentliche Beiträge liefert die Arbeit für die Erforschung der automatisierten Inhaltsanalyse. Die Arbeit dokumentiert vor allem die wissenschaftliche Auseinan- dersetzung mit automatisierten Themenanalysen. Während der Arbeit an diesem Thema wurden vom Autor geeignete Vorgehensweisen entwickelt, wie Verfahren des Text-Mining in der Praxis für Inhaltsanalysen einzusetzen sind. Unter anderem wur- den Beiträge zur Visualisierung und einfachen Benutzung unterschiedlicher Verfahren geleistet. Verfahren aus dem Bereich des Topic Modelling, des Clustering und der Kookkurrenzanalyse mussten angepasst werden, sodass deren Anwendung in inhalts- analytischen Anwendungen möglich ist. Weitere Beiträge entstanden im Rahmen der methodologischen Einordnung der computergestützten Themenanalyse und in der Definition innovativer Anwendungen in diesem Bereich. Die für die vorliegende Arbeit durchgeführte Experimente und Untersuchungen wurden komplett in einer eigens ent- wickelten Software durchgeführt, die auch in anderen Projekten erfolgreich eingesetzt wird. Um dieses System herum wurden Verarbeitungsketten,Datenhaltung,Visualisie- rung, grafische Oberflächen, Möglichkeiten der Dateninteraktion, maschinelle Lernver- fahren und Komponenten für das Dokumentretrieval implementiert. Dadurch werden die komplexen Methoden und Verfahren für die automatische Themenanalyse einfach anwendbar und sind für künftige Projekte und Analysen benutzerfreundlich verfüg- bar. Sozialwissenschaftler,Politikwissenschaftler oder Kommunikationswissenschaftler können mit der Softwareumgebung arbeiten und Inhaltsanalysen durchführen, ohne die Details der Automatisierung und der Computerunterstützung durchdringen zu müssen. Text Mining Inhaltsanalyse Themenanalyse Maschinelles Lernen Text Mining Content Analysis Topic Analysis Machine Learning Text Mining Inhaltsanalyse Themenanalyse Maschinelles Lernen ddc:500 Text Mining Inhaltsanalyse Themenanalyse Maschinelles Lernen
8	Automatisierte Verfahren für die Themenanalyse nachrichtenorientierter Textquellen: Automatisierte Verfahren für dieThemenanalyse nachrichtenorientierterTextquellen Niekler, Andreas 13 January 2016 (has links) Im Bereich der medienwissenschaftlichen Inhaltsanalyse stellt die Themenanalyse einen wichtigen Bestandteil dar. Für die Analyse großer digitaler Textbestände hin- sichtlich thematischer Strukturen ist es deshalb wichtig, das Potential automatisierter computergestützter Methoden zu untersuchen. Dabei müssen die methodischen und analytischen Anforderungen der Inhaltsanalyse beachtet und abgebildet werden, wel- che auch für die Themenanalyse gelten. In dieser Arbeit werden die Möglichkeiten der Automatisierung der Themenanalyse und deren Anwendungsperspektiven untersucht. Dabei wird auf theoretische und methodische Grundlagen der Inhaltsanalyse und auf linguistische Theorien zu Themenstrukturen zurückgegriffen,um Anforderungen an ei- ne automatische Analyse abzuleiten. Den wesentlichen Beitrag stellt die Untersuchung der Potentiale und Werkzeuge aus den Bereichen des Data- und Text-Mining dar, die für die inhaltsanalytische Arbeit in Textdatenbanken hilfreich und gewinnbringend eingesetzt werden können. Weiterhin wird eine exemplarische Analyse durchgeführt, um die Anwendbarkeit automatischer Methoden für Themenanalysen zu zeigen. Die Arbeit demonstriert auch Möglichkeiten der Nutzung interaktiver Oberflächen, formu- liert die Idee und Umsetzung einer geeigneten Software und zeigt die Anwendung eines möglichen Arbeitsablaufs für die Themenanalyse auf. Die Darstellung der Potentiale automatisierter Themenuntersuchungen in großen digitalen Textkollektionen in dieser Arbeit leistet einen Beitrag zur Erforschung der automatisierten Inhaltsanalyse. Ausgehend von den Anforderungen, die an eine Themenanalyse gestellt werden, zeigt diese Arbeit, mit welchen Methoden und Automatismen des Text-Mining diesen Anforderungen nahe gekommen werden kann. Zusammenfassend sind zwei Anforde- rungen herauszuheben, deren jeweilige Erfüllung die andere beeinflusst. Zum einen ist eine schnelle thematische Erfassung der Themen in einer komplexen Dokument- sammlung gefordert, um deren inhaltliche Struktur abzubilden und um Themen kontrastieren zu können. Zum anderen müssen die Themen in einem ausreichenden Detailgrad abbildbar sein, sodass eine Analyse des Sinns und der Bedeutung der The- meninhalte möglich ist. Beide Ansätze haben eine methodische Verankerung in den quantitativen und qualitativen Ansätzen der Inhaltsanalyse. Die Arbeit diskutiert diese Parallelen und setzt automatische Verfahren und Algorithmen mit den Anforde- rungen in Beziehung. Es können Methoden aufgezeigt werden, die eine semantische und damit thematische Trennung der Daten erlauben und einen abstrahierten Über- blick über große Dokumentmengen schaffen. Dies sind Verfahren wie Topic-Modelle oder clusternde Verfahren. Mit Hilfe dieser Algorithmen ist es möglich, thematisch kohärente Untermengen in Dokumentkollektion zu erzeugen und deren thematischen Gehalt für Zusammenfassungen bereitzustellen. Es wird gezeigt, dass die Themen trotz der distanzierten Betrachtung unterscheidbar sind und deren Häufigkeiten und Verteilungen in einer Textkollektion diachron dargestellt werden können. Diese Auf- bereitung der Daten erlaubt die Analyse von thematischen Trends oder die Selektion bestimmter thematischer Aspekte aus einer Fülle von Dokumenten. Diachrone Be- trachtungen thematisch kohärenter Dokumentmengen werden dadurch möglich und die temporären Häufigkeiten von Themen können analysiert werden. Für die detaillier- te Interpretation und Zusammenfassung von Themen müssen weitere Darstellungen und Informationen aus den Inhalten zu den Themen erstellt werden. Es kann gezeigt werden, dass Bedeutungen, Aussagen und Kontexte über eine Kookurrenzanalyse im Themenkontext stehender Dokumente sichtbar gemacht werden können. In einer Anwendungsform, welche die Leserichtung und Wortarten beachtet, können häufig auftretende Wortfolgen oder Aussagen innerhalb einer Thematisierung statistisch erfasst werden. Die so generierten Phrasen können zur Definition von Kategorien eingesetzt werden oder mit anderen Themen, Publikationen oder theoretischen An- nahmen kontrastiert werden. Zudem sind diachrone Analysen einzelner Wörter, von Wortgruppen oder von Eigennamen in einem Thema geeignet, um Themenphasen, Schlüsselbegriffe oder Nachrichtenfaktoren zu identifizieren. Die so gewonnenen Infor- mationen können mit einem „close-reading“ thematisch relevanter Dokumente ergänzt werden, was durch die thematische Trennung der Dokumentmengen möglich ist. Über diese methodischen Perspektiven hinaus lassen sich die automatisierten Analysen als empirische Messinstrumente im Kontext weiterer hier nicht besprochener kommu- nikationswissenschaftlicher Theorien einsetzen. Des Weiteren zeigt die Arbeit, dass grafische Oberflächen und Software-Frameworks für die Bearbeitung von automatisier- ten Themenanalysen realisierbar und praktikabel einsetzbar sind. Insofern zeigen die Ausführungen, wie die besprochenen Lösungen und Ansätze in die Praxis überführt werden können. Wesentliche Beiträge liefert die Arbeit für die Erforschung der automatisierten Inhaltsanalyse. Die Arbeit dokumentiert vor allem die wissenschaftliche Auseinan- dersetzung mit automatisierten Themenanalysen. Während der Arbeit an diesem Thema wurden vom Autor geeignete Vorgehensweisen entwickelt, wie Verfahren des Text-Mining in der Praxis für Inhaltsanalysen einzusetzen sind. Unter anderem wur- den Beiträge zur Visualisierung und einfachen Benutzung unterschiedlicher Verfahren geleistet. Verfahren aus dem Bereich des Topic Modelling, des Clustering und der Kookkurrenzanalyse mussten angepasst werden, sodass deren Anwendung in inhalts- analytischen Anwendungen möglich ist. Weitere Beiträge entstanden im Rahmen der methodologischen Einordnung der computergestützten Themenanalyse und in der Definition innovativer Anwendungen in diesem Bereich. Die für die vorliegende Arbeit durchgeführte Experimente und Untersuchungen wurden komplett in einer eigens ent- wickelten Software durchgeführt, die auch in anderen Projekten erfolgreich eingesetzt wird. Um dieses System herum wurden Verarbeitungsketten,Datenhaltung,Visualisie- rung, grafische Oberflächen, Möglichkeiten der Dateninteraktion, maschinelle Lernver- fahren und Komponenten für das Dokumentretrieval implementiert. Dadurch werden die komplexen Methoden und Verfahren für die automatische Themenanalyse einfach anwendbar und sind für künftige Projekte und Analysen benutzerfreundlich verfüg- bar. Sozialwissenschaftler,Politikwissenschaftler oder Kommunikationswissenschaftler können mit der Softwareumgebung arbeiten und Inhaltsanalysen durchführen, ohne die Details der Automatisierung und der Computerunterstützung durchdringen zu müssen. info:eu-repo/classification/ddc/500 ddc:500
9	中文文本探勘工具：主題分析、詞組關聯強度、相關句擷取 / Tools for Chinese Text Mining: Topic Analysis, Association Strengths of Collocations, Extraction of Relevant Statements 林書佑, Lin, Shu Yu Unknown Date (has links) 現今資料大量且快速數位化的時代，各領域對資訊探勘分析技術越趨倚重。而在數位人文中領域中從2009年「數位典藏與數位人文國際研討會」開始，此議題逐漸受到重視，主要目的為將數位文物結合資訊分析與圖像化輔助，透過不同層面的詮釋建構出更完整的文物資訊。本研究建構一個針對各種中文語料分析的工具，藉由latent semantic analysis、pointwise mutual information、Person’s chi-squared test、typed dependencies distance、word2vec、Gibbs sampling for latent Dirichlet allocation等計算語料中關鍵詞彙關聯強度的方法，並結合分群方法找出可能的主題，最後擷取符合分群結果的相關句子予以輔助人文學者分析詮釋。透過提供各種觀察語料的面向，進而提升語料相關研究學者的效率。我們利用《人民日報》、《新青年》、《聯合報》、《中國時報》作為實驗與測試的中文語料。且將《新青年》藉由此套工具分析後的結果提供給專業人文學者，做為分析詮釋的參考資訊與佐證依據，並在「2015年數位典藏與數位人文國際研討會」中發表論文。目前我們透過各種中文語料評估工具的效能，且在未來將公開此套工具提供給更多學者使用，節省對於語料分析的時間。 / In recent years, a wide variety of text documents have been transformed into digital format. Hence, using data mining techniques to analyze data is becoming more and more popular in many research fields. The digital humanities gradually have taken seriously since "International Conference of Digital Archives and Digital Humanities" began in 2009. The main purpose of the digital heritage combined with information analysis and visualization could improve the effectiveness of cultural information through different levels of interpretation. In this study, we construct a set of tools for Chinese text mining, calculating associated strengths of collocations work through latent semantic analysis, pointwise mutual information, Person’s chi-squared test, typed dependencies distance, word2vec, and Gibbs sampling for latent Dirichlet allocation etc. The tools employ clustering method to identify the possible topics, meanwhile, the tools will extract the relevant statements according to the clustering results. These clustering and relevant statements contribute and improve the efficiency of humanities scholars’ analysis through providing a variety of observations about the corpora. At the experimental stage of this study, we considered the "People's Daily", "New Youth", "United Daily News", and "China Times" as as the corpora for testing. Among the research, humanities scholars analyzed the "New Youth" by the tools and published a paper in the "2015 International Conference of Digital Archives and Digital Humanities". Currently, we assess the effectiveness of the tools through a variety of Chinese corpora. In the future, we will make the tools freely available on the Internet for Chinese text mining. We hope these time-saving tools can assist in humanities scholars’ study of Chinese corpora. 文本探勘主題分析詞組關聯強度相關句擷取 Text Mining Topic Analysis Association Strengths of Collocations Extraction of Relevant Statements
10	AIM - A Social Media Monitoring System for Quality Engineering Bank, Mathias 14 June 2013 (has links) In the last few years the World Wide Web has dramatically changed the way people are communicating with each other. The growing availability of Social Media Systems like Internet fora, weblogs and social networks ensure that the Internet is today, what it was originally designed for: A technical platform in which all users are able to interact with each other. Nowadays, there are billions of user comments available discussing all aspects of life and the data source is still growing. This thesis investigates, whether it is possible to use this growing amount of freely provided user comments to extract quality related information. The concept is based on the observation that customers are not only posting marketing relevant information. They also publish product oriented content including positive and negative experiences. It is assumed that this information represents a valuable data source for quality analyses: The original voices of the customers promise to specify a more exact and more concrete definition of \"quality\" than the one that is available to manufacturers or market researchers today. However, the huge amount of unstructured user comments makes their evaluation very complex. It is impossible for an analysis protagonist to manually investigate the provided customer feedback. Therefore, Social Media specific algorithms have to be developed to collect, pre-process and finally analyze the data. This has been done by the Social Media monitoring system AIM (Automotive Internet Mining) that is the subject of this thesis. It investigates how manufacturers, products, product features and related opinions are discussed in order to estimate the overall product quality from the customers\\\'' point of view. AIM is able to track different types of data sources using a flexible multi-agent based crawler architecture. In contrast to classical web crawlers, the multi-agent based crawler supports individual crawling policies to minimize the download of irrelevant web pages. In addition, an unsupervised wrapper induction algorithm is introduced to automatically generate content extraction parameters which are specific for the crawled Social Media systems. The extracted user comments are analyzed by different content analysis algorithms to gain a deeper insight into the discussed topics and opinions. Hereby, three different topic types are supported depending on the analysis needs. * The creation of highly reliable analysis results is realized by using a special context-aware taxonomy-based classification system. * Fast ad-hoc analyses are applied on top of classical fulltext search capabilities. * Finally, AIM supports the detection of blind-spots by using a new fuzzified hierarchical clustering algorithm. It generates topical clusters while supporting multiple topics within each user comment. All three topic types are treated in a unified way to enable an analysis protagonist to apply all methods simultaneously and in exchange. The systematically processed user comments are visualized within an easy and flexible interactive analysis frontend. Special abstraction techniques support the investigation of thousands of user comments with minimal time efforts. Hereby, specifically created indices show the relevancy and customer satisfaction of a given topic.:1 Introduction 1.1 Chapter Overview 2 Problem Definition and Data Environment 2.1 Commonly Applied Quality Sensors 2.2 The Growing Importance of Social Media 2.3 Social Media based Quality Experience 2.4 Change to the Holistic Concept of Quality 2.5 Definition of User Generated Content and Social Media 2.6 Social Media Software Architectures 3 Data Collection 3.1 Related Work 3.2 Requirement Analysis 3.3 A Blackboard Crawler Architecture 3.4 Semi-supervised Wrapper Generation 3.5 Structure Modifification Detection 3.6 Conclusion 4 Hierarchical Fuzzy Clustering 4.1 Related Work 4.2 Generalization of Agglomerative Crisp Clustering Algorithms 4.3 Topic Groups Generation 4.4 Evaluation 4.5 Conclusion 5 A Social Media Monitoring System for Quality Analyses 5.1 Related Work 5.2 Pre-Processing Workflow 5.3 Quality Indices 5.4 AIM Architecture 5.5 Evaluation 5.6 Conclusion 6 Conclusion and Perspectives 6.1 Contributions and Conclusions 6.2 Perspectives Bibliography / In den letzten Jahren hat sich das World Wide Web dramatisch verändert. War es vor einigen Jahren noch primär eine Informationsquelle, in der ein kleiner Anteil der Nutzer Inhalte veröffentlichen konnte, so hat sich daraus eine Kommunikationsplattform entwickelt, in der jeder Nutzer aktiv teilnehmen kann. Die dadurch enstehende Datenmenge behandelt jeden Aspekt des täglichen Lebens. So auch Qualitätsthemen. Die Analyse der Daten verspricht Qualitätssicherungsmaßnahmen deutlich zu verbessern. Es können dadurch Themen behandelt werden, die mit klassischen Sensoren schwer zu messen sind. Die systematische und reproduzierbare Analyse von benutzergenerierten Daten erfordert jedoch die Anpassung bestehender Tools sowie die Entwicklung neuer Social-Media spezifischer Algorithmen. Diese Arbeit schafft hierfür ein völlig neues Social Media Monitoring-System, mit dessen Hilfe ein Analyst tausende Benutzerbeiträge mit minimaler Zeitanforderung analysieren kann. Die Anwendung des Systems hat einige Vorteile aufgezeigt, die es ermöglichen, die kundengetriebene Definition von \"Qualität\" zu erkennen.:1 Introduction 1.1 Chapter Overview 2 Problem Definition and Data Environment 2.1 Commonly Applied Quality Sensors 2.2 The Growing Importance of Social Media 2.3 Social Media based Quality Experience 2.4 Change to the Holistic Concept of Quality 2.5 Definition of User Generated Content and Social Media 2.6 Social Media Software Architectures 3 Data Collection 3.1 Related Work 3.2 Requirement Analysis 3.3 A Blackboard Crawler Architecture 3.4 Semi-supervised Wrapper Generation 3.5 Structure Modifification Detection 3.6 Conclusion 4 Hierarchical Fuzzy Clustering 4.1 Related Work 4.2 Generalization of Agglomerative Crisp Clustering Algorithms 4.3 Topic Groups Generation 4.4 Evaluation 4.5 Conclusion 5 A Social Media Monitoring System for Quality Analyses 5.1 Related Work 5.2 Pre-Processing Workflow 5.3 Quality Indices 5.4 AIM Architecture 5.5 Evaluation 5.6 Conclusion 6 Conclusion and Perspectives 6.1 Contributions and Conclusions 6.2 Perspectives Bibliography info:eu-repo/classification/ddc/500 ddc:500 Social Media;

Search results