Spelling suggestions: "subject:"topic detection"" "subject:"oopic detection""
1 |
Topic Retrospection with Storyline-based Summarization on News ReportsLiang, Chia-Hao 18 July 2005 (has links)
The electronics newspaper becomes a main source for online news readers. When facing the numerous stories, news readers need some supports in order to review a topic in short time. Due to previous researches in TDT (Topic Detection and Tracking) only considering how to identify events and present the results with news titles and keywords, a summarized text to present event evolution is necessary for general news readers to retrospect events under a news topic.
This thesis proposes a topic retrospection process and implements the SToRe system that identifies various events under a new topic and constructs the relationship to compose a summary which gives readers the sketch of event evolution in a topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The summarization extracts the representative sentences and takes the main theme as the template to compose summary. The summarization not only provides enough information to comprehend the development of a topic, but also can be an index to help readers to find more detailed information.
A lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. From the experimental results, the SToRe system can help news readers more effectively and efficiently to capture the development of a topic.
|
2 |
CLAN: Communities in Lexical Associative NetworksVanarase, Aashay K. January 2015 (has links)
No description available.
|
3 |
Automatická detekce témat, segmentace a vizualizace on-line kurzů / Automatic Topic Detection, Segmentation and Visualization of On-Line CoursesŘídký, Josef January 2016 (has links)
The aim of this work is to create a web application for automatic topic detection and segmentation of on-line courses. During playback of processed records, the application should be able to offer records from thematically consistent on-line courses. This document contains problem description, list of used instruments, description of implementation, the principle of operation and description of final user interface.
|
4 |
Using WordNet Synonyms and Hypernyms in Automatic Topic DetectionWargärde, Nicko January 2020 (has links)
Detecting topics by extracting keywords from written text using TF-IDF has been studied and successfully used in many applications. Adding a semantic layer to TF-IDF-based topic detection using WordNet synonyms and hypernyms has been explored in document clustering by assigning concepts that describe texts or by adding all synonyms and hypernyms that occurring words have to a list of keywords. A new method where TF-IDF scores are calculated and WordNet synset members’ TF-IDFscores are added together to all occurring synonyms and/or hypernyms is explored in this paper. Here, such an approach is evaluated by comparing extracted keywords using TF-IDF and the new proposed method, SynPlusTF-IDF, against manually assigned keywords in a database of scientific abstracts. As topic detection is widely used in many contexts and applications, improving current methods is of great value as the methods can become more accurate at extracting correct and relevant keywords from written text. An experiment was conducted comparing the two methods and their accuracy measured using precision and recall and by calculating F1-scores.The F1-scores ranged from 0.11131 to 0.14264 for different variables and the results show that SynPlusTF-IDF is not better at topic detection compared to TF-IDF and both methods performed poorly at topic detection with the chosen dataset.
|
5 |
Extracción y recuperación de información temporalLlidó Escrivá, Dolores Maria 20 September 2002 (has links)
Esta tesis intenta demostrar cómo los sistemas de Recuperación de Información (RI) y los sistemas de Detección de Sucesos (TDT - Topic Detection and Tracking) mejoran si se añade una componente temporal extraída automáticamente del texto, a la cual denominaremos periodo de suceso. Este atributo representa el espacio de tiempo en el que transcurre el suceso principal relatado en cada documento. Con este propósito la tesis ha cubierto los siguientes objetivos: * Definición de un modelo de tiempo para representar y manipular las referencias temporales que aparecen en un texto. * Desarrollo de una aplicación para la extracción de expresiones temporales lingüísticas y el reconocimiento del intervalo absoluto que referencian según el calendario Gregoriano. * Implementación de un sistema para la extracción automática del periodo de suceso. * Modificación de los actuales sistemas de RI, TDT para incluir la información temporal extraída con las herramientas anteriores.
|
6 |
Topic-Based Aggregation of Questions in Social MediaMuthmann, Klemens 25 October 2013 (has links) (PDF)
Software produced by big companies such as SAP is often feature rich, very expensive and thus only affordable by other big companies. It usually takes months and special trained consultants to install and manage such software. However as vendors move to other market segments, featuring smaller companies, different requirements arise. It is not possible for medium or small sized companies to spend as much money for business software solutions as big companies do. They especially cannot afford to hire expensive consultants. It is on the other hand not economic for the vendor to provide the personnel free of charge. One solution to this dilemma is bundling all customer support cases on special Web platforms, such as customer support forums. SAP for example has the SAP Community Network1. This has the additional benefit that customers may help each other.
(...)
|
7 |
DETECTION OF EMERGING DISRUPTIVE FIELDS USING ABSTRACTS OF SCIENTIFIC ARTICLESVorgianitis, Georgios January 2017 (has links)
With the significant advancementstaking place in the last three decades in the field ofInformation Technology (IT), we are witnesses of an era unprecedented to the standards that mankind was used to, for centuries. Having access to a huge amount of dataalmost instantly,entails certainadvantages. One of which is the ability to observe in which segments of their expertise do scientists focus their research. That kind of knowledge, if properly appraised could hold the key to explaining what the new directions of the applied sciences will be and thus could help to constructing a “map” of the future developments from the Research and Development labs of the industries worldwide.Though the above statement may be considered too “futuristic”, already there have been documented attempts in the literature that have been fruitful into using vast amount of scientific data in an attempt to outline future scientific trends and thus scientific discoveries.The purpose of this research is to try to use a pioneeringmethodof modeling text corpora that already hasbeen used previously to the task of mapping the history of scientific discovery, that of Latent Dirichlet Allocation (LDA)and try to evaluate itsusability into detecting emerging research trends by the mere use of only the “Abstracts” from a collectionof scientific articles.To do that an experimental set is being utilized and the process is repeated over three experimental runs.The results, although not the ones that would validate the hypothesis, are showing that with certain improvements in the processing the hypothesis could be confirmed.
|
8 |
Using the organizational and narrative thread structures in an e-book to support comprehensionSun, Yixing January 2007 (has links)
Stories, themes, concepts and references are organized structurally and purposefully in most books. A person reading a book needs to understand themes and concepts within the context. Schank’s Dynamic Memory theory suggested that building on existing memory structures is essential to cognition and learning. Pirolli and Card emphasized the need to provide people with an independent and improved ability to access and understand information in their information seeking activities. Through a review of users’ reading behaviours and of existing e-Book user interfaces, we found that current e-Book browsers provide minimal support for comprehending the content of large and complex books. Readers of an e-Book need user interfaces that present and relate the organizational and narrative structures, and moreover, reveal the thematic structures. This thesis addresses the problem of providing readers with effective scaffolding of multiple structures of an e-Book in the user interface to support reading for comprehension. Recognising a story or topic as the basic unit in a book, we developed novel story segmentation techniques for discovering narrative segments, and adapted story linking techniques for linking narrative threads in semi-structured linear texts of an e-Book. We then designed an e-Book user interface to present the complex structures of the e-Book, as well as to assist the reader to discover these structures. We designed and developed evaluation methodologies to investigate reading and comprehension in e-Books, in order to assess the effectiveness of this user interface. We designed semi-directed reading tasks using a Story-Theme Map, and a set of corresponding measurements for the answers. We conducted user evaluations with book readers. Participants were asked to read stories, to browse and link related stories, and to identify major themes of stories in an e-Book. This thesis reports the experimental design and results in detail. The results confirmed that the e-Book interface helped readers perform reading tasks more effectively. The most important and interesting finding is that the interface proved to be more helpful to novice readers who had little background knowledge of the book. In addition, each component that supported the user interface was evaluated separately in a laboratory setting and, these results too are reported in the thesis.
|
9 |
Um data warehouse de publicações científicas: indexação automática da dimensão tópicos de pesquisa dos data marts / A Data warehouse for scientific publications: automatic indexing of the research topic dimension for using in data martsKanashiro, Augusto 04 May 2007 (has links)
Este trabalho de mestrado insere-se no contexto do projeto de uma Ferramenta Inteligente de Apoio à Pesquisa (FIP), sendo desenvolvida no Laboratório de Inteligência Computacional do ICMC-USP. A ferramenta foi proposta para recuperar, organizar e minerar grandes conjuntos de documentos científicos (na área de computação). Nesse contexto, faz-se necessário um repositório de artigos para a FIP. Ou seja, um Data Warehouse que armazene e integre todas as informações extraídas dos documentos recuperados de diferentes páginas pessoais, institucionais e de repositórios de artigos da Web. Para suportar o processamento analítico on-line (OLAP) das informações e facilitar a ?mineração? desses dados é importante que os dados estejam armazenados apropriadamente. Dessa forma, o trabalho de mestrado teve como objetivo principal projetar um Data Warehouse (DW) para a ferramenta FIP e, adicionalmente, realizar experimentos com técnicas de mineração e Aprendizado de Máquina para automatizar o processo de indexação das informações e documentos armazenados no data warehouse (descoberta de tópicos). Para as consultas multidimensionais foram construídos data marts de forma a permitir aos pesquisadores avaliar tendências e a evolução de tópicos de pesquisa / This dissertation is related to the project of an Intelligent Tool for Research Supporting (FIP), being developed at the Laboratory of Computational Intelligence at ICMC-USP. The tool was proposed to retrieve, organize, and mining large sets of scientific documents in the field of computer science. In this context, a repository of articles becomes necessary, i.e., a Data Warehouse that integrates and stores all extracted information from retrieved documents from different personal and institutional web pages, and from article repositories. Data appropriatelly stored is decisive for supporting online analytical processing (OLAP), and ?data mining? processes. Thus, the main goal of this MSc research was design the FIP Data Warehouse (DW). Additionally, we carried out experiments with Data Mining and Machine Learning techniques in order to automatize the process of indexing of information and documents stored in the data warehouse (Topic Detection). Data marts for multidimensional queries were designed in order to facilitate researchers evaluation of research topics trend and evolution
|
10 |
Analyse des médias sociaux de santé pour évaluer la qualité de vie des patientes atteintes d’un cancer du sein / Analysis of social health media to assess the quality of life of breast cancer patientsTapi Nzali, Mike Donald 28 September 2017 (has links)
En 2015, le nombre de nouveaux cas de cancer du sein en France s'élève à 54 000. Le taux de survie 5 ans après le diagnostic est de 89 %. Si les traitements modernes permettent de sauver des vies, certains sont difficiles à supporter. De nombreux projets de recherche clinique se sont donc focalisés sur la qualité de vie (QdV) qui fait référence à la perception que les patients ont de leurs maladies et de leurs traitements. La QdV est un critère d'évaluation clinique pertinent pour évaluer les avantages et les inconvénients des traitements que ce soit pour le patient ou pour le système de santé. Dans cette thèse, nous nous intéresserons aux histoires racontées par les patients dans les médias sociaux à propos de leur santé, pour mieux comprendre leur perception de la QdV. Ce nouveau mode de communication est très prisé des patients car associé à une grande liberté du discours due notamment à l'anonymat fourni par ces sites.L’originalité de cette thèse est d’utiliser et d'étendre des méthodes de fouille de données issues des médias sociaux pour la langue Française. Les contributions de ce travail sont les suivantes : (1) construction d’un vocabulaire patient/médecin ; (2) détection des thèmes discutés par les patients; (3) analyse des sentiments des messages postés par les patients et (4) mise en relation des différentes contributions citées.Dans un premier temps, nous avons utilisé les textes des patients pour construire un vocabulaire patient/médecin spécifique au domaine du cancer du sein, en recueillant divers types d'expressions non-expertes liées à la maladie, puis en les liant à des termes biomédicaux utilisés par les professionnels de la santé. Nous avons combiné plusieurs méthodes de la littérature basées sur des approches linguistiques et statistiques. Pour évaluer les relations obtenues, nous utilisons des validations automatiques et manuelles. Nous avons ensuite transformé la ressource construite dans un format lisible par l’être humain et par l’ordinateur en créant une ontologie SKOS, laquelle a été intégrée dans la plateforme BioPortal.Dans un deuxième temps, nous avons utilisé et étendu des méthodes de la littérature afin de détecter les différents thèmes discutés par les patients dans les médias sociaux et de les relier aux dimensions fonctionnelles et symptomatiques des auto-questionnaires de QdV (EORTC QLQ-C30 et EORTC QLQ-BR23). Afin de détecter les thèmes, nous avons appliqué le modèle d’apprentissage non supervisé LDA avec des prétraitements pertinents. Ensuite, nous avons proposé une méthode permettant de calculer automatiquement la similarité entre les thèmes détectés et les items des auto-questionnaires de QdV. Nous avons ainsi déterminé de nouveaux thèmes complémentaires à ceux déjà présents dans les questionnaires. Ce travail a ainsi mis en évidence que les données provenant des forums de santé sont susceptibles d'être utilisées pour mener une étude complémentaire de la QdV.Dans un troisième temps, nous nous sommes focalisés sur l’extraction de sentiments (polarité et émotions). Pour cela, nous avons évalué différentes méthodes et ressources pour la classification de sentiments en Français. Ces expérimentations ont permis de déterminer les caractéristiques utiles dans la classification de sentiments pour différents types de textes, y compris les textes provenant des forums de santé. Finalement, nous avons utilisé les différentes méthodes proposées dans cette thèse pour quantifier les thèmes et les sentiments identifiés dans les médias sociaux de santé.De manière générale, ces travaux ont ouvert des perspectives prometteuses sur diverses tâches d'analyse des médias sociaux pour la langue française et en particulier pour étudier la QdV des patients à partir des forums de santé. / In 2015, the number of new cases of breast cancer in France is 54,000.The survival rate after 5 years of cancer diagnosis is 89%.If the modern treatments allow to save lives, some are difficult to bear. Many clinical research projects have therefore focused on quality of life (QoL), which refers to the perception that patients have on their diseases and their treatments.QoL is an evaluation method of alternative clinical criterion for assessing the advantages and disadvantages of treatments for the patient and the health system. In this thesis, we will focus on the patients stories in social media dealing with their health. The aim is to better understand their perception of QoL. This new mode of communication is very popular among patients because it is associated with a great freedom of speech, induced by the anonymity provided by these websites.The originality of this thesis is to use and extend social media mining methods for the French language. The main contributions of this work are: (1) construction of a patient/doctor vocabulary; (2) detection of topics discussed by patients; (3) analysis of the feelings of messages posted by patients and (4) combinaison of the different contributions to quantify patients discourse.Firstly, we used the patient's texts to construct a patient/doctor vocabulary, specific to the field of breast cancer, by collecting various types of non-experts' expressions related to the disease, linking them to the biomedical terms used by health care professionals. We combined several methods of the literature based on linguistic and statistical approaches. To evaluate the relationships, we used automatic and manual validations. Then, we transformed the constructed resource into human-readable format and machine-readable format by creating a SKOS ontology, which is integrated into the BioPortal platform.Secondly, we used and extended literature methods to detect the different topics discussed by patients in social media and to relate them to the functional and symptomatic dimensions of the QoL questionnaires (EORTC QLQ-C30 and EORTC QLQ-BR23). In order to detect the topics discussed by patients, we applied the unsupervised learning LDA model with relevant preprocessing. Then, we applied a customized Jaccard coefficient to automatically compute the similarity distance between the topics detected with LDA and the items in the auto-questionnaires. Thus, we detected new emerging topics from social media that could be used to complete actual QoL questionnaires. This work confirms that social media can be an important source of information for the study of the QoL in the field of cancer.Thirdly, we focused on the extraction of sentiments (polarity and emotions). For this, we evaluated different methods and resources for the classification of feelings in French.These experiments aim to determine useful characteristics in the classification of feelings for different types of texts, including texts from health forums.Finally, we used the different methods proposed in this thesis to quantify the topics and feelings identified in the health social media.In general, this work has opened promising perspectives on various tasks of social media analysis for the French language and in particular the study of the QoL of patients from the health forums.
|
Page generated in 0.1046 seconds