• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 254
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 636
  • 636
  • 146
  • 133
  • 122
  • 116
  • 95
  • 90
  • 88
  • 83
  • 81
  • 78
  • 73
  • 67
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
331

Analyse der Meinungsentwicklung in Online Foren – Konzept und Fallstudie

Kaiser, Carolin, Bodendorf, Freimut January 2010 (has links)
Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt.
332

Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content

Gröbe, Mathias 13 October 2015 (has links)
Die vorliegende Masterarbeit behandelt den Entwurf und die exemplarische Umsetzung eines Arbeitsablaufs zur Aufbereitung von georeferenziertem Microblogging Content. Als beispielhafte Datenquelle wurde Twitter herangezogen. Darauf basierend, wurden Überlegungen angestellt, welche Arbeitsschritte nötig und mit welchen Mitteln sie am besten realisiert werden können. Dabei zeigte sich, dass eine ganze Reihe von Bausteinen aus dem Bereich des Data Mining und des Text Mining für eine Pipeline bereits vorhanden sind und diese zum Teil nur noch mit den richtigen Einstellungen aneinandergereiht werden müssen. Zwar kann eine logische Reihenfolge definiert werden, aber weitere Anpassungen auf die Fragestellung und die verwendeten Daten können notwendig sein. Unterstützt wird dieser Prozess durch verschiedenen Visualisierungen mittels Histogrammen, Wortwolken und Kartendarstellungen. So kann neues Wissen entdeckt und nach und nach die Parametrisierung der Schritte gemäß den Prinzipien des Geovisual Analytics verfeinert werden. Für eine exemplarische Umsetzung wurde nach der Betrachtung verschiedener Softwareprodukte die für statistische Anwendungen optimierte Programmiersprache R ausgewählt. Abschließend wurden die Software mit Daten von Twitter und Flickr evaluiert. / This Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr.
333

GoWeb: Semantic Search and Browsing for the Life Sciences

Dietze, Heiko 20 October 2010 (has links)
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%. Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services. To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering. Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system.
334

Text mining for social harm and criminal justice application

Ritika Pandey (9147281) 30 July 2020 (has links)
Increasing rates of social harm events and plethora of text data demands the need of employing text mining techniques not only to better understand their causes but also to develop optimal prevention strategies. In this work, we study three social harm issues: crime topic models, transitions into drug addiction and homicide investigation chronologies. Topic modeling for the categorization and analysis of crime report text allows for more nuanced categories of crime compared to official UCR categorizations. This study has important implications in hotspot policing. We investigate the extent to which topic models that improve coherence lead to higher levels of crime concentration. We further explore the transitions into drug addiction using Reddit data. We proposed a prediction model to classify the users’ transition from casual drug discussion forum to recovery drug discussion forum and the likelihood of such transitions. Through this study we offer insights into modern drug culture and provide tools with potential applications in combating opioid crises. Lastly, we present a knowledge graph based framework for homicide investigation chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of key features that determine whether a homicide is ultimately solved. For this purpose<br>we perform named entity recognition to determine witnesses, detectives and suspects from chronology, use keyword expansion to identify various evidence types and finally link these entities and evidence to construct a homicide investigation knowledge graph. We compare the performance over several choice of methodologies for these sub-tasks and analyze the association between network statistics of knowledge graph and homicide solvability. <br>
335

Kompendium der Online-Forschung (DGOF)

Deutsche Gesellschaft für Online-Forschung e. V. (DGOF) 24 November 2021 (has links)
Die DGOF veröffentlicht hier digitale Kompendien zu aktuellen Themen der Online-Forschung mit Fachbeiträgen von Experten und Expertinnen aus der Branche.
336

Aiding Remote Diagnosis with Text Mining / Underlätta fjärrdiagnostik genom textbaserad datautvinning

Hellström Karlsson, Rebecca January 2017 (has links)
The topic of this thesis is on how text mining could be used on patient-reported symptom descriptions, and how it could be used to aid doctors in their diagnostic process. Healthcare delivery today is struggling to provide care to remote settings, and costs are increasing together with the aging population. The aid provided to doctors from text mining on patient descriptions is unknown.Investigating if text mining can aid doctors by presenting additional information, based on what patients who write similar things to what their current patient is writing about, could be relevant to many settings in healthcare. It has the potential to improve the quality of care to remote settings and increase the number of patients treated on the limited resources available. In this work, patient texts were represented using the Bag-of-Words model and clustered using the k-means algorithm. The final clustering model used 41 clusters, and the ten most important words for the cluster centroids were used as representative words for the cluster. An experiment was then performed to gauge how the doctors were aided in their diagnostic process when patient texts were paired with these additional words. The results were that the words aided doctors in cases where the patient case was difficult and that the clustering algorithm can be used to provide the current patient with specific follow-up questions. / Ämnet för detta examensarbete är hur text mining kan användas på patientrapporterade symptombeskrivningar, och hur det kan användas för att hjälpa läkare att utföra den diagnostiska processen. Sjukvården har idag svårigheter med att leverera vård till avlägsna orter, och vårdkostnader ökar i och med en åldrande population. Idag är det okänt hur text mining skulle kunna hjälpa doktorer i sitt arbete. Att undersöka om läkare blir hjälpta av att presenteras med mer information, baserat på vad patienter som skriver liknande saker som deras nuvarande patient gör, kan vara relevant för flera olika områden av sjukvården. Text mining har potential att förbättra vårdkvaliten för patienter med låg tillgänglighet till vård, till exempel på grund av avstånd. I detta arbete representerades patienttexter med en Bag-of-Words modell, och klustrades med en k-means algoritm. Den slutgiltiga klustringsmodellen använde sig av 41 kluster, och de tio viktigaste orden för klustercentroider användes för att representera respektive kluster. Därefter genomfördes ett experiment för att se om och hur läkare blev behjälpta i sin diagnostiska process, om patienters texter presenterades med de tio orden från de kluster som texterna hörde till. Resultaten från experimentet var att orden hjälpte läkarna i de mer komplicerade patientfallen, och att klustringsalgoritmen skulle kunna användas för att ställa specifika följdfrågor till patienter.
337

Automating debugging through data mining / Automatisering av felsökning genom data mining

Thun, Julia, Kadouri, Rebin January 2017 (has links)
Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug. / Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.
338

Robust relationship extraction in the biomedical domain

Thomas, Philippe 25 November 2015 (has links)
Seit Jahrhunderten wird menschliches Wissen in Form von natürlicher Sprache ausgetauscht und in Dokumenten schriftlich aufgezeichnet. In den letzten Jahren konnte man auf dem Gebiet der Lebenswissenschaften eine exponentielle Zunahme wissenschaftlicher Publikationen beobachten. Diese Dissertation untersucht die automatische Extraktion von Beziehungen zwischen Eigennamen. Innerhalb dieses Gebietes beschäftigt sich die Arbeit mit der Steigerung der Robustheit für die Relationsextraktion. Zunächst wird der Einsatz von Ensemble-Methoden anhand von Daten aus der "Drug-drug-interaction challenge 2013" evaluiert. Ensemble-Methoden erhöhen die Robustheit durch Aggregation unterschiedlicher Klassifikationssysteme zu einem Modell. Weiterhin wird in dieser Arbeit das Problem der Relationsextraktion auf Dokumenten mit unbekannten Texteigenschaften beschrieben. Es wird gezeigt, dass die Verwendung des halb-überwachten Lernverfahrens self training in solchen Fällen eine höhere Robustheit erzielt als die Nutzung eines Klassifikators, der lediglich auf einem manuell annotierten Korpus trainiert wurde. Zur Ermittlung der Robustheit wird das Verfahren des cross-learnings verwendet. Zuletzt wird die Verwendung von distant-supervision untersucht. Korpora, welche mit der distant-supervision-Methode erzeugt wurden, weisen ein inhärentes Rauschen auf und profitieren daher von robusten Relationsextraktionsverfahren. Es werden zwei verschiedene Methoden untersucht, die auf solchen Korpora trainiert werden. Beide Ansätze zeigen eine vergleichbare Leistung wie vollständig überwachte Klassifikatoren, welche mit dem cross-learning-Verfahren evaluiert wurden. Um die Nutzung von Ergebnissen der Informationsextraktion zu erleichtern, wurde die semantische Suchmaschine GeneView entwickelt. Anforderungen an die Rechenkapazität beim Erstellen von GeneView werden diskutiert und Anwendungen auf den von verschiedenen Text-Mining-Komponenten extrahierten Daten präsentiert. / For several centuries, a great wealth of human knowledge has been communicated by natural language, often recorded in written documents. In the life sciences, an exponential increase of scientific articles has been observed, hindering the effective and fast reconciliation of previous finding into current research projects. This thesis studies the automatic extraction of relationships between named entities. Within this topic, it focuses on increasing robustness for relationship extraction. First, we evaluate the use of ensemble methods to improve performance using data provided by the drug-drug-interaction challenge 2013. Ensemble methods aggregate several classifiers into one model, increasing robustness by reducing the risk of choosing an inappropriate single classifier. Second, this work discusses the problem of applying relationship extraction to documents with unknown text characteristics. Robustness of a text mining component is assessed by cross-learning, where a model is evaluated on a corpus different from the training corpus. We apply self-training, a semi-supervised learning technique, in order to increase cross-learning performance and show that it is more robust in comparison to a classifier trained on manually annotated text only. Third, we investigate the use of distant supervision to overcome the need of manually annotated training instances. Corpora derived by distant supervision are inherently noisy, thus benefiting from robust relationship extraction methods. We compare two different methods and show that both approaches achieve similar performance as fully supervised classifiers, evaluated in the cross-learning scenario. To facilitate the usage of information extraction results, including those developed within this thesis, we develop the semantic search engine GeneView. We discuss computational requirements to build this resource and present some applications utilizing the data extracted by different text-mining components.
339

針對臉書粉絲專頁貼文之政治傾向預測 / Predicting Political Affiliation for Posts on Facebook Fan Pages

張哲嘉, Chang, Che Chia Unknown Date (has links)
近年來社群媒體興起,尤其以臉書為主。在台灣超過1500萬個臉書用戶,其遍及族群從公眾人物到一般民眾。此外,這類的新興資訊交流平台其實內含許多有意義的資訊,每一則貼文都隱含著每個使用者的情緒以及立場傾向。然而,利用社群媒體來預測選舉與使用者政治傾向已成為目前的趨勢,在台灣各政黨與政治人物紛紛成立粉絲專頁,投入利用網路與社群媒體來打選戰與預測民調。本研究發現此一特性,致力於預測粉絲專頁貼文之政治傾向,收集台灣兩大政黨派國民黨與民進黨之粉絲專頁貼文,建立兩種預測模型分別為以相異字為特徵模型與文字互動特徵模型。利用資料探勘之相關技術,以貼文所含藍綠政黨特徵表現建立分類器,並細部探討與設計多種特徵組合,比較不同特徵組合之預測效果與影響因素以及在預測資料不平衡的情況下是否影響分類結果。最後,研究結果顯示使用文字特徵中黨派典型字與互動特徵值域取對數並搭配KNN分類器效果最佳,其準確度可達0.908,F1-score可達0.827。 / Recently, the social media is becoming more and more popular, especially Facebook. In Taiwan, there are 15 million Facebook users from celebrities to the general public. Receiving information every day from Facebook has become a lifestyle of most people. These new information-exchanging platforms contain lots of meaningful messages including users' emotions and affiliations. Moreover, using the social media data to predict the election result and political affiliation is becoming the current trend in Taiwan. For example, politicians try to win the election and predict the polls by means of Internet and the social media, and every political parties also have their own fan pages. In this thesis, we make an effort to predict the political inclinations of the posts of fan pages, especially for KMT and DPP which are the two largest political parties in Taiwan. We filter the appropriate literal and interactive features. We use the posts of the two parties to predict the political inclinations by constructing the classification models .In the end, we compare the performances of different classifiers .The result shows that the literal and interactive features work the best with KNN classifier, whose accuracy and F1-score are 0.908 and 0.827, respectively.
340

運用文字探勘技術輔助建構法律條文之語意網路-以公司法為例

張露友 Unknown Date (has links)
本論文運用文字探勘相關技術,嘗試自動計算法條間的相似度,輔助專家從公司法眾多法條中整理出規則,建立法條之間的關聯,使整個法典並不是獨立的法條條號與法條內容的集合,而是在法條之間透過語意的方式連結成網路,並從分析與解釋關聯的過程中,探討文字探勘技術運用於法律條文上所遭受之困難及限制,以供後續欲從事相關研究之參考。 本論文的研究結果,從積極面來看,除了可以建立如何運用文字探勘於輔助法律知識擷取的方法之外,另一方面,從消極面來看,倘若研究結果顯示,文字探勘技術並不完全適用於法律條文的知識擷取上,那麼對於從事類似研究的專業人員而言,本論文所提出的結論與建議,亦可作為改善相關技術的重要參考。 / This thesis tries to use text mining technique to calculate, compare and analyze the correlation of legal codes. And based on the well-known defined legal concept and knowledge, it also tries to help explain and evaluate the relations above using the result of automatic calculation. Furthermore, this thesis also wishes to contribute on how to apply information technology effectively onto legal knowledge domain. If the research reveals the positive result, it could be used for knowledge build-up on how to utilize text mining technology onto legal domain. However, if the study shows that text mining doesn’t apparently apply to knowledge extracting of legal domain, then the conclusion and suggestion from this thesis could also be regarded as a important reference to other professionals in the similar research fields.

Page generated in 0.0908 seconds