Global ETD Search

11	Sentence Compression by Removing Recursive Structure from Parse Tree Matsubara, Shigeki, Kato, Yoshihide, Egawa, Seiji 04 December 2008 (has links) PRICAI 2008: Trends in Artificial Intelligence 10th Pacific Rim International Conference on Artificial Intelligence, Hanoi, Vietnam, December 15-19, 2008. Proceedings maximum entropy method recursive structure phrase structure text summarization sentence compression
12	Automatic Video Categorization And Summarization Demirtas, Kezban 01 September 2009 (has links) (PDF) In this thesis, we make automatic video categorization and summarization by using subtitles of videos. We propose two methods for video categorization. The first method makes unsupervised categorization by applying natural language processing techniques on video subtitles and uses the WordNet lexical database and WordNet domains. The method starts with text preprocessing. Then a keyword extraction algorithm and a word sense disambiguation method are applied. The WordNet domains that correspond to the correct senses of keywords are extracted. Video is assigned a category label based on the extracted domains. The second method has the same steps for extracting WordNet domains of video but makes categorization by using a learning module. Experiments with documentary videos give promising results in discovering the correct categories of videos. Video summarization algorithms present condensed versions of a full length video by identifying the most significant parts of the video. We propose a video summarization method using the subtitles of videos and text summarization techniques. We identify significant sentences in the subtitles of a video by using text summarization techniques and then we compose a video summary by finding the video parts corresponding to these summary sentences.
13	Text Summarization Using Latent Semantic Analysis Ozsoy, Makbule Gulcin 01 February 2011 (has links) (PDF) Text summarization solves the problem of presenting the information needed by a user in a compact form. There are different approaches to create well formed summaries in literature. One of the newest methods in text summarization is the Latent Semantic Analysis (LSA) method. In this thesis, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish and English documents, and their performances are compared using their ROUGE scores.
14	Generalized Probabilistic Topic and Syntax Models for Natural Language Processing Darling, William Michael 14 September 2012 (has links) This thesis proposes a generalized probabilistic approach to modelling document collections along the combined axes of both semantics and syntax. Probabilistic topic (or semantic) models view documents as random mixtures of unobserved latent topics which are themselves represented as probabilistic distributions over words. They have grown immensely in popularity since the introduction of the original topic model, Latent Dirichlet Allocation (LDA), in 2004, and have seen successes in computational linguistics, bioinformatics, political science, and many other fields. Furthermore, the modular nature of topic models allows them to be extended and adapted to specific tasks with relative ease. Despite the recorded successes, however, there remains a gap in combining axes of information from different sources and in developing models that are as useful as possible for specific applications, particularly in Natural Language Processing (NLP). The main contributions of this thesis are two-fold. First, we present generalized probabilistic models (both parametric and nonparametric) that are semantically and syntactically coherent and contain many simpler probabilistic models as special cases. Our models are consistent along both axes of word information in that an LDA-like component sorts words that are semantically related into distinct topics and a Hidden Markov Model (HMM)-like component determines the syntactic parts-of-speech of words so that we can group words that are both semantically and syntactically affiliated in an unsupervised manner, leading to such groups as verbs about health care and nouns about sports. Second, we apply our generalized probabilistic models to two NLP tasks. Specifically, we present new approaches to automatic text summarization and unsupervised part-of-speech (POS) tagging using our models and report results commensurate with the state-of-the-art in these two sub-fields. Our successes demonstrate the general applicability of our modelling techniques to important areas in computational linguistics and NLP. NLP topic models machine learning bayesian methods text summarization part-of-speech tagging syntax modelling
15	Modelo Cassiopeia como avaliador de sum?rios autom?ticos: aplica??o em um corpus educacional Aguiar, Lu?s Henrique Gon?alves de 05 December 2017 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-04-19T18:44:37Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) luis_henrique_goncalves_aguiar.pdf: 1963486 bytes, checksum: ce8ee9274d520386492773d2e289f109 (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-04-23T16:27:14Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) luis_henrique_goncalves_aguiar.pdf: 1963486 bytes, checksum: ce8ee9274d520386492773d2e289f109 (MD5) / Made available in DSpace on 2018-04-23T16:27:14Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) luis_henrique_goncalves_aguiar.pdf: 1963486 bytes, checksum: ce8ee9274d520386492773d2e289f109 (MD5) Previous issue date: 2017 / Considerando a grande quantidade de informa??es textuais dispon?veis atualmente, principalmente na web, est? se tronando cada vez mais dif?cil o acesso e a assimila??o desse conte?do para o usu?rio. Nesse contexto, torna-se necess?rio buscar tarefas capazes de transformar essa grande quantidade de dados em conhecimento ?til e organizado. Uma alternativa para amenizar esse problema, ? reduzir o volume de informa??es dispon?veis a partir da produ??o de resumos dos textos originais, por meio da sumariza??o autom?tica (SA) de textos. A sumariza??o autom?tica de textos consiste na produ??o autom?tica de resumos a partir de um ou mais textos-fonte, de modo que o sum?rio contenha as informa??es mais relevantes deste. A avalia??o de resumos ? uma tarefa importante no campo da sumariza??o autom?tica de texto, a abordagem mais intuitiva ? a avalia??o humana, por?m ? onerosa e improdutiva. Outra alternativa ? a avalia??o autom?tica, alguns avaliadores foram propostos, sendo a mais conhecida e amplamente usada ? a medida ROUGE (Recall-Oriented Understudy for Gisting Evaluation). Um fator limitante na avalia??o da ROUGE ? a utiliza??o do sum?rio humano de refer?ncia, o que implica em uma restri??o do idioma e dom?nio, al?m de requerer um trabalho humano demorado e oneroso. Diante das dificuldades encontradas na avalia??o de sum?rios autom?ticos, o presente trabalho apresenta o modelo Cassiopeia como um novo m?todo de avalia??o. O modelo ? um agrupador de textos hier?rquico, o qual consiste no uso da sumariza??o na etapa do pr?-processamento, onde a qualidade do agrupamento ? influenciada positivamente conforme a qualidade da sumariza??o. As simula??es realizadas neste trabalho mostraram que a avalia??o realizada pelo modelo Cassiopeia ? semelhante a avalia??o realizada pela ferramenta ROUGE. Por outro lado, a utiliza??o do modelo Cassiopeia como avaliador de sum?rios autom?ticos evidenciou algumas vantagens, sendo as principais; a n?o utiliza??o do sum?rio humano no processo de avalia??o, e a independ?ncia do dom?nio e do idioma. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / Considering the large amount of textual information currently available, especially on the web, it is becoming increasingly difficult to access and assimilate this content to the user. In this context, it becomes necessary to search for tasks that can transform this large amount of information into useful and organized knowledge. The solution, or at least an alternative, to moderate this problem is to reduce the volume of information available, from the production of abstracts of the original texts, through automatic summarization (SA) of texts. The Automatic Summarization of texts consists of the automatic production of abstracts from one or more source texts, which the summary must contain the most relevant information of the source text. The evaluation of abstracts is an important task in the field of automatic text summarization, the most intuitive approach is human evaluation, but it is costly and unproductive. Another alternative is the automatic evaluation, some evaluators have been proposed, and the most widely used is the ROUGE (Recall-Oriented Understudy for Gisting Evaluation). A limiting factor in ROUGE's evaluation is the use of the human reference summary, which implies a restriction of language and domain, as well as requiring time-consuming and expensive human work. In view of the difficulties encountered in the evaluation of automatic summaries, this paper presents the Cassiopeia model as a new evaluation method. The model is a hierarchical text grouper, which consists of the use of the summarization in the stage of the pre-processing, where the quality of the grouping is influenced positively according to the quality of the summarization. The simulations performed in this work showed that the evaluations performed by Cassiopeia in comparison to the ROUGE tool are similar. On the other hand, the use of the Cassiopeia model as an automatic summarization evaluator showed some advantages, the main ones are; being the non-use of the human abstract in the evaluation process, and the independent of the domain and the language. Sumariza??o autom?tica de texto Avalia??o Cassiopeia ROUGE Automatic text summarization Evaluation
16	Automatic Text Summarization Using Importance of Sentences for Email Corpus January 2015 (has links) abstract: With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization. / Dissertation/Thesis / Masters Thesis Computer Science 2015 Computer science Data Mining Machine Learning Natural Language Processing Pyramid Evaluation Term Rank Text Summarization
17	Keeping an Eye on the Context : An Eye Tracking Study of Cohesion Errors in Automatic Text Summarization / Med ett öga på sammanhanget : En ögonrörelsestudie av kohesionsfel i automatiska textsammanfattningar Rennes, Evelina January 2013 (has links) Automatic text summarization is a growing field due to the modern world’s Internet based society, but to automatically create perfect summaries is not easy, and cohesion errors are common. By the usage of an eye tracking camera, this thesis studies the nature of four different types of cohesion errors occurring in summaries. A total of 23 participants read and rated four different texts and marked the most difficult areas of each text. Statistical analysis of the data revealed that absent cohesion or context and broken anaphoric reference (pronouns) caused some disturbance in reading, but that the impact is restricted to the effort to read rather than the comprehension of the text. Erroneous anaphoric reference (pronouns) was not detected by the participants which poses a problem for automatic text summarizers, and other potential disturbing factors were detected. Finally, the question of the meaningfulness of keeping absent cohesion or context as a separate error type was raised. Automatic text summarization cohesion errors eye tracking CogSum
18	FINE-TUNE A LANGUAGE MODEL FOR TEXT SUMMARIZATION (BERTSUM) ON EDGAR-CORPUS Niu, Yijie January 2022 (has links) Financial reports include a lot of useful information for investors, but extracting this information is time-consuming. We think text summarization is a feasible method. In this thesis, we implement BERTSUM, a state-of-the-art language model for text summarization, and evaluate the results by ROUGE metrics. The experiment was carried out on a novel and large-scale financial dataset called EDGAR-CORPUS. The BERTSUM with a transformer achieves the best performance with a ROUGE-L F1 score of 9.26%. We also hand-picked some model-generated summaries that contained common errors and investigated the causes. The results were then compared to previous research. The ROUGE-L F1 value in the previous study was much higher than ours, we think this is due to the length of the financial reports. Machine Learning Natural Language Processing Text Summarization Transformers Neural Networks Probability Theory and Statistics Sannolikhetsteori och statistik
19	Summarization and keyword extraction on customer feedback data : Comparing different unsupervised methods for extracting trends and insight from text Skoghäll, Therése, Öhman, David January 2022 (has links) Polestar has during the last couple of months more than doubled its amount of customer feedback, and the forecast for the future is that this amount will increase even more. Manually reading this feedback is expensive and time-consuming, and for this reason there's a need to automatically analyse the customer feedback. The company wants to understand the customer and extract trends and topics that concerns the consumer in order to improve the customer experience. Over the last couple of years as Natural Language Processing developed immensely, new state of the art language models have pushed the boundaries in all type of benchmark tasks. In this thesis have three different extractive summarization models and three different keyword extraction methods been tested and evaluated based on two different quantitative measures and human evaluation to extract information from text. This master thesis has shown that extractive summarization models with a Transformer-based text representation are best at capturing the context in a text. Based on the quantitative results and the company's needs, Textrank with a Transformer-based embedding was chosen as the final extractive summarization model. For Keywords extraction was the best overall model YAKE!, based on the quantitative measure and human validation Unsupervised learning Natural Language Processing Text Summarization Keyword Extraction K-means YAKE! BERT Mathematics Matematik
20	Numerical Optimization Methods based on Discrete Structure for Text Summarization and Relational Learning / 文書要約と関係学習のための離散構造に基づいた数値最適化法 Nishino, Masaaki 24 September 2014 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18613号 / 情博第537号 / 新制\|\|情\|\|95(附属図書館) / 31513 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授山本章博, 教授黒橋禎夫, 教授阿久津達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Numerical Optimization Discrete Structure Relational Learning Text Summarization Natural Language Processing Machine Learning 007

Search results