Global ETD Search

41	Language Models as Evaluators : A Novel Framework for Automatic Evaluation of News Article Summaries / Språkmodeller som Utvärderare : Ett Nytt Ramverk för Automatiserad Utvärdering av Nyhetssammanfattningar Helgesson Hallström, Celine January 2023 (has links) The advancements in abstractive summarization using Large Language Models (LLMs) have brought with it new challenges in evaluating the quality and faithfulness of generated summaries. This thesis explores a human-like automated method for evaluating news article summaries. By leveraging two LLMs with instruction-following capabilities (GPT-4 and Claude), the aim is to examine to what extent the quality of summaries can be measured by predictions of an LLM. The proposed framework involves defining specific attributes of desired summaries, which are used to design generation prompts and evaluation questions. These questions are presented to the LLMs in natural language during evaluation to assess of various summary qualities. To validate the effectiveness of the evaluation method, an adversarial approach is employed, in which a dataset comprising summaries with distortions related to various summary attributes is generated. In an experiment, the two LLMs evaluate the adversarial dataset, and their ability to detect known distortions is measured and analyzed. The findings suggest that the LLM-based evaluations demonstrate promise in detecting binary qualitative issues, such as incorrect facts. However, the reliability of the zero-shot evaluation varies depending on the evaluating LLM and the specific questions used. Further research is required to validate the accuracy and generalizability of the results, particularly in subjective dimensions where the results of this thesis are inconclusive. Nonetheless, this thesis provides insights that can serve as a foundation for future advancements in the field of automatic text evaluation. / De framsteg som gjorts inom abstrakt sammanfattning med hjälp av stora språkmodeller (LLM) har medfört nya utmaningar när det gäller att utvärdera kvaliteten och sanningshalten hos genererade sammanfattningar. Detta examensarbete utforskar en mänskligt inspirerad automatiserad metod för att utvärdera sammanfattningar av nyhetsartiklar. Genom att dra nytta av två LLM:er med instruktionsföljande förmågor (GPT-4 och Claude) är målet att undersöka i vilken utsträckning kvaliteten av sammanfattningar kan bestämmas med hjälp av språkmodeller som utvärderare. Det föreslagna ramverket innefattar att definiera specifika egenskaper hos önskade sammanfattningar, vilka används för att utforma genereringsuppmaningar (prompts) och utvärderingsfrågor. Dessa frågor presenteras för språkmodellerna i naturligt språk under utvärderingen för att bedöma olika kvaliteter hos sammanfattningar. För att validera utvärderingsmetoden används ett kontradiktoriskt tillvägagångssätt där ett dataset som innefattar sammanfattningar med förvrängningar relaterade till olika sammanfattningsattribut genereras. I ett experiment utvärderar de två språkmodellerna de motstridiga sammanfattningar, och deras förmåga att upptäcka kända förvrängningar mäts och analyseras. Resultaten tyder på att språkmodellerna visar lovande resultat vid upptäckt av binära kvalitativa problem, såsom faktafel. Dock varierar tillförlitligheten hos utvärderingen beroende på vilken språkmodell som används och de specifika frågorna som ställs. Ytterligare forskning krävs för att validera tillförlitligheten och generaliserbarheten hos resultaten, särskilt när det gäller subjektiva dimensioner där resultaten är osäkra. Trots detta ger detta arbete insikter som kan utgöra en grund för framtida framsteg inom området för automatisk textutvärdering. Natural Language Processing Large Language Models Automatic Text Evaluation Text Summarization Multilingualism Naturlig Språkbehandling Stora Språkmodeller Automatisk Textutvärdering Textsammanfattning Flerspråkighet Computer and Information Sciences Data- och informationsvetenskap
42	Evaluating Text Summarization Models on Resumes : Investigating the Quality of Generated Resume Summaries and their Suitability as Resume Introductions / Utvärdering av Textsammanfattningsmodeller för CV:n : Undersökning av Kvaliteten på Genererade CV-sammanfattningar och deras Lämplighet som CV-introduktioner Krohn, Amanda January 2023 (has links) This thesis aims to evaluate different abstractive text summarization models and techniques for summarizing resumes. It has two main objectives: investigate the models’ performance on resume summarization and assess the suitability of the generated summaries as resume introductions. Although automatic abstractive text summarization has gained traction in various areas, its application in the resume domain has not yet been explored. Resumes present a unique challenge for abstractive summarization due to their diverse style, content, and length. To address these challenges, three state-of-the-art pre-trained text generation models: BART, T5, and ProphetNet, were selected. Additionally, two approaches that can handle longer resumes were investigated. The first approach, named LongBART, modified the BART architecture by incorporating the Longformer’s self-attention into the encoder. The second approach, named HybridBART, used an extractive-then-abstractive summarization strategy. The models were fine-tuned on a dataset of 653 resume-introduction pairs and were evaluated using automatic metrics as well as two types of human evaluations: a survey and expert interviews. None of the models demonstrated superiority across all criteria and evaluation metrics. However, the survey responses indicated that LongBART showed promising results, receiving the highest scores in three out of five criteria. On the other hand, ProphetNet consistently received the lowest scores across all criteria in the survey, and across all automatic metrics. Expert interviews emphasized that the generated summaries cannot be considered correct summaries due to the presence of hallucinated personal attributes. However, there is potential for using the generated texts as resume introductions, given that measures are taken to ensure the hallucinated personal attributes are sufficiently generic. / Denna avhandling utvärderar olika modeller och tekniker för automatisk textsammanfattning för sammanfattning av CV:n. Avhandlingen har två mål: att undersöka modellernas prestanda på sammanfattning av CV:n och bedöma lämpligheten att använda de genererade sammanfattningar som CV-introduktioner. Även om automatisk abstrakt textsummering har fått fotfäste inom olika sammanhang är dess tillämpning inom CV-domänen ännu outforskad. CV:n utgör en unik utmaning för abstrakt textsammanfattning på grund av deras varierande stil, innehåll och längd. För att hantera dessa utmaningar valdes tre av de främsta förtränade modellerna inom textgenerering: BART, T5 och ProphetNet. Dessutom undersöktes två extra metoder som kan hantera längre CV:n. Det första tillvägagångssättet, kallat LongBART, modifierade BART-arkitekturen genom att inkludera självuppmärksamhet från Longformer-arkitekturen i kodaren. Det andra tillvägagångssättet, kallat HybridBART, använde en extraktiv-sen-abstraktiv sammanfattningsstrategi. Modellerna finjusterades med ett dataset med 653 CV-introduktionspar och utvärderades med hjälp av automatiska mått, samt två typer av mänsklig utvärdering: en enkätundersökning och intervjuer med experter. Ingen av modellerna visade överlägsenhet på alla kriterier och utvärderingsmått. Dock indikerade enkätsvaren att LongBART visade lovande resultat, genom att få högst poäng i tre av fem utvärderingskategorier. Å andra sidan fick ProphetNet lägst poäng i samtliga utvärderingskategorier, samt lägst poäng i alla automatiska mätningar. Expertintervjuer framhävde att de genererade sammanfattningarna inte kan anses vara pålitliga som fristående sammanfattningar på grund av förekomsten av hallucinerade personliga egenskaper. Trots detta finns det potential att använda dessa sammanfattningar som introduktioner, under förutsättningen att åtgärder vidtas för att säkerställa att hallucinerade personliga attribut är tillräckligt generiska. Natural language processing Abstractive text summarization Transformer architecture Fine-tuning Resumes Språkteknologi Abstrakt textsammanfattning Transformer-arkitektur Finjustering CV Computer Sciences Datavetenskap (datalogi)
43	Adapative Summarization for Low-resource Domains and Algorithmic Fairness Keymanesh, Moniba January 2022 (has links) No description available. Computer Science Information Systems Linguistics Artificial Intelligence automatic text summarization algorithmic fairness legal language processing controllable summarization low-resource natural language processing
44	Improving information gathering for IT experts. : Combining text summarization and individualized information recommendation. Bergenudd, Anton January 2022 (has links) Information gathering and information overload is an ever growing topic of concernfor Information Technology (IT) experts. The amount of information dealt withon an everyday basis is large enough to take up valuable time having to scatterthrough it all to find the relevant information. As for the application area of IT,time is directly related to money as having to waste valuable production time ininformation gathering and allocation of human resources is a direct loss of profitsfor any given company. Two issues are mainly addressed through this thesis: textsare too lengthy and the difficulty of finding relevant information. Through the useof Natural Language Processes (NLP) methods such as topic modelling and textsummarization, a proposed solution is constructed in the form of a technical basiswhich can be implemented in most business areas. An experiment along with anevaluation session is setup in order to evaluate the performance of the technical basisand enforce the focus of this paper, namely ”How effective is text summarizationcombined with individualized information recommendation in improving informationgathering of IT experts?”. Furthermore, the solution includes a construction of userprofiles in an attempt to individualize content and theoretically present more relevantinformation. The results for this project are affected by the substandard quality andmagnitude of data points, however positive trends are discovered. It is stated thatthe use of user profiles further enhances the amount of relevant articles presentedby the model along with the increasing recall and precision values per iteration andaccuracy per number of updates made per user. Not enough time is spent as for theextent of the evaluation process to confidently state the validity of the results morethan them being inconsistent and insufficient in magnitude. However, the positivetrends discovered creates further speculations on if the project is given enough timeand resources to reach its full potential. Essentially, one can theoretically improveinformation gathering by summarizing texts combined with individualization. Text summarization information gathering individualization topic modelling natural language processes profiling. Computer Sciences Datavetenskap (datalogi)
45	Information Theoretic Approach To Extractive Text Summarization Ravindra, G 08 1900 (has links) Automatic text summarization techniques, which can reduce a source text to a summary text by content generalization or selection have assumed signifi- cance in recent times due to the ever expanding information explosion created by the World Wide Web. Summaries generated by generalization of information are called abstracts and those generated by selection of portions of text (sentences, phrases etc.) are called extracts. Further, summaries could for each document separately or multiple documents could be summarized together to produce a single summary. The challenges in making machines generate extracts or abstracts are primarily due to the lack of understanding of human cognitive processes. Summary generated by humans seems to be influenced by their moral, emotional and ethical stance on the subject and their background knowledge of the content being summarized.These characteristics are hardly understood and difficult to model mathematically. Further automatic summarization is very much handicapped by limitations of existing computing resources and lack of good mathematical models of cognition. In view of these, the role of rigorous mathematical theory in summarization has been limited hitherto. The research reported in this thesis is a contribution towards bringing in the awesome power of well-established concepts information theory to the field of summarization. Contributions of the Thesis The specific focus of this thesis is on extractive summarization. Its domain spans multi-document summarization as well as single document summarization. In the whole thesis the word "summarization" and "summary", imply extract generation and sentence extracts respectively. In this thesis, two new and novel summarizers referred to as ESCI (Extractive Summarization using Collocation Information) and De-ESCI (Dictionary enhanced ESCI) have been proposed. In addition, an automatic summary evaluation technique called DeFuSE (Dictionary enhanced Fuzzy Summary Evaluator) has also been introduced.The mathematical basis for the evolution of the scoring scheme proposed in this thesis and its relationship with other well-known summarization algorithms such as latent Semantic Indexing (LSI) is also derived. The work detailed in this thesis is specific to the domain of extractive summarization of unstructured text without taking into account the data set characteristics such as the positional importance of sentences. This is to ensure that the summarizer works well for a broad class of documents, and to keep the proposed models as generic as possible. Central to the proposed work is the concept of "Collocation Information of a word", its quantification and application to summarization. "Collocation Information" (CI) is the amount of information (Shannon’s measure) that a word and its collocations together contribute to the total information in the document(s) being summarized.The CI of a word has been computed using Shannon’s measure for information using a joint probability distribution. Further, a base value of CI called "Discrimination Threshold" (DT) has also been derived. To determine DT, sentences from a large collection of documents covering various topics including the topic covered by the document(s) being summarized were broken down into sequences of word collocations.The number of possible neighbors for a word within a specified collocation window was determined. This number has been called the "cardinality of the collocating set" and is represented as \|ℵ (w)\|. It is proved that if \|ℵ (w)\| determined from this large document collection for any word w is fixed, then the maximum value of the CI for a word w is proportional to \|ℵ (w)\|. This constrained maximum is the "Discrimination Threshold" and is used as the base value of CI. Experimental evidence detailed in this thesis shows that sentences containing words with CI greater than DT are most likely to be useful in an extract. Words in every sentence of the document(s) being summarized have been assigned scores based on the difference between their current value of CI and their respective DT. Individual word scores have been summed to derive a score for every sentence. Sentences are ranked according to their scores and the first few sentences in the rank order have been selected as the extract summary. Redundant and semantically similar sentences have been excluded from the selection process using a simple similarity detection algorithm. This novel method for extraction has been called ESCI in this thesis. In the second part of the thesis, the advantages of tagging words as nouns, verbs, adjectives and adverbs without the use of sense disambiguation has been explored. A hierarchical model for abstraction of knowledge has been proposed, and those cases where such a model can improve summarization accuracy have been explained. Knowledge abstraction has been achieved by converting collocations into their hypernymous versions. In the second part of the thesis, the advantages of tagging words as nouns, verbs, adjectives and adverbs without the use of sense disambiguation has been explored. A hierarchical model for abstraction of knowledge has been proposed, and those cases where such a model can improve summarization accuracy have been explained. Knowledge abstraction has been achieved by converting collocations into their hypernymous versions. The number of levels of abstraction varies based on the sense tag given to each word in the collocation being abstracted. Once abstractions have been determined, Expectation- Maximization algorithm is used to determine the probability value of each collocation at every level of abstraction. A combination of abstracted collocations from various levels is then chosen and sentences are assigned scores based on collocation information of these abstractions.This summarization scheme has been referred to as De-ESCI (Dictionary enhanced ESCI). It had been observed in many human summary data sets that the factual attribute of the human determines the choice of noun and verb pairs. Similarly, the emotional attribute of the human determines the choice of the number of noun and adjective pairs. In order to bring these attributes into the machine generated summaries, two variants of DeESCI have been proposed. The summarizer with the factual attribute has been called as De-ESCI-F, while the summarizer with the emotional attribute has been called De-ESCI-E in this thesis. Both create summaries having two parts. First part of the summary created by De-ESCI-F is obtained by scoring and selecting only those sentences where a fixed number of nouns and verbs occur.The second part of De-ESCI-F is obtained by ranking and selecting those sentences which do not qualify for the selection process in the first part. Assigning sentence scores and selecting sentences for the second part of the summary is exactly like in ESCI. Similarly, the first part of De-ESCI-E is generated by scoring and selecting only those sentences where fixed number of nouns and adjectives occur. The second part of the summary produced by De-ESCI-E is exactly like the second part in De-ESCI-F. As the model summary generated by human summarizers may or may not contain sentences with preference given to qualifiers (adjectives), the automatic summarizer does not know apriori whether to choose sentences with qualifiers over those without qualifiers. As there are two versions of the summary produced by De-ESCI-F and De-ESCIE, one of them should be closer to the human summarizer’s point of view (in terms of giving importance to qualifiers). This technique of choosing the best candidate summary has been referred to as De-ESCI-F/E. Performance Metrics The focus of this thesis is to propose new models and sentence ranking techniques aimed at improving the accuracy of the extract in terms of sentences selected, rather than on the readability of the summary. As a result, the order of sentences in the summary is not given importance during evaluation. Automatic evaluation metrics have been used and the performance of the automatic summarizer has been evaluated in terms of precision, recall and f-scores obtained by comparing its output with model human generated extract summaries. A novel summary evaluator called DeFuSE has been proposed in this thesis, and its scores are used along with the scores given by a standard evaluator called ROUGE. DeFuSE evaluates an extract in terms of precision, recall and f-score relying on The use of WordNet hypernymy structure to identify semantically similar sentences in a document. It also uses fuzzy set theory to compute the extent to which a sentence from the machine generated extract belongs to the model summary. Performance of candidate summarizers has been discussed in terms of percentage improvement in fscore relative to the baselines. Average of ROUGE and DeFuSE f-score for every summary is computed, and the mean value of these scores is used to compare performance improvement. Performance For illustrative purposes, DUC 2002 and DUC 2003 multi-document data sets have been used. From these data sets only the 400 word summaries of DUC 2002 and track-4 (novelty track) summaries of DUC 2003 are useful for evaluation of sentence extracts and hence only these have been used. f-score has been chosen as a measure of performance. Standard baselines such as coverage, size and lead and also probabilistic baselines have been used to measure percentage improvement in f-score of candidate summarizers relative to these baselines. Further, summaries generated by MEAD using centroid and length as features for ranking (MEAD-CL), MEAD using positional, centroid and length as features for ranking (MEAD-CLP), Microsoft Word automatic summarizer (MS-Word) and Latent Semantic Indexing (LSI) based summarizer were used to compare the performance of the proposed summarization schemes. Abstracting Collocation Method Information Theory Text Summarization Discrimination Threshold Dictionary Enhanced ESCI Collocation Information Automatic Extractive Summarizer Computer Science
46	Zpracování uživatelských recenzí / Processing of User Reviews Cihlářová, Dita January 2019 (has links) Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
47	Information Extraction and Design of An Assisted QA system in Motor Design Luo, Hongyi January 2022 (has links) The Linz Center of Mechatronics’ SymSpace platform is designed to provide intelligent design and training for the traditional engineer training and industrial design approach in the field of motor design, which relies on the engineer’s own experience and manual work. This paper first analyzes and explores the usage patterns and possible improvement perspectives of motor design components using SymSpace user data. Then an attempt is made to summarize the motor design manual provided by LCM using a text summary model and use it for training engineers. Next, a question-and-answer system model was used to try to provide an aid system for engineers in design. The evaluation of text summaries and question and answer systems is difficult in the motor design domain because the amount of redundant textual information in this domain is small and key information is often presented in detail rather than in the main stem of the sentence. In this case, instead of evaluating the model using traditional machine scores, this paper refers to the feedback from LCM experts as future users. The final results show that, despite the problems of difficulty in explaining the reasons; the possibility of being misleading; and the loss of information details, both attempts are generally positive and the exploration in this direction is worthwhile. / Symspace från Linz Center of Mechatronics är utformad för att tillhandahålla intelligent design och utbildning för den traditionella ingenjörsutbildningen och den industriella designmetoden inom motorkonstruktion, som bygger på ingenjörens egen erfarenhet och manuellt arbete. I den här artikeln analyseras och utforskas först användningsmönster och möjliga förbättringsperspektiv för komponenter för motorkonstruktion med hjälp av användaruppgifter från Symspace. Därefter görs ett försök att sammanfatta den motorkonstruktionshandbok som tillhandahålls av LCM med hjälp av en modell för textsammanfattningar och använda den för att utbilda ingenjörer. Därefter användes en modell för ett system med frågor och svar för att försöka tillhandahålla ett hjälpsystem för ingenjörer vid konstruktion. Utvärderingen av textsammanfattningar och fråga-och-svar-system är svår inom motorkonstruktionsområdet eftersom mängden överflödig textinformation inom detta område är liten och nyckelinformation ofta presenteras i detalj snarare än i huvudstammen av meningen. I det här fallet hänvisar den här artikeln i stället för att utvärdera modellen med hjälp av traditionella maskinpoäng till feedback från LCM-experter som framtida användare. De slutliga resultaten visar att trots problemen med svårigheten att förklara orsakerna, möjligheten att vara vilseledande och förlusten av informationsdetaljer är båda försöken generellt sett positiva och att utforskningen i denna riktning är värd att fortsätta. Motor Design PEGASUS Question and Answer System Text Summarization Natural Language Processing Deep Learning Motorkonstruktion PEGASUS fråge- och svarssystem sammanfattning av text behandling av naturligt språk djupinlärning Computer and Information Sciences Data- och informationsvetenskap
48	An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition Tsatsaronis, George 10 October 2017 (has links) (PDF) This article provides an overview of the first BioASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BioASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. BioASQ-Wettbewerb hierarchische Textsystematik semantische Indizierung Informationsbeschaffung Passage Retrieval Fragebeantwortung Mehrfachdokumentation Technsiche Universität Dresden Publikationsfonds BioASQ Competition Hierarchical Text Classification Semantic indexing Information retrieval Passage retrieval Question answering Multi-document text summarization Technische Universität Dresden Publishing Fund ddc:570 rvk:WA 15000
49	An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition Tsatsaronis, George 10 October 2017 (has links) This article provides an overview of the first BioASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BioASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. info:eu-repo/classification/ddc/570 ddc:570
50	Metody sumarizace dokumentů na webu / Methods of Document Summarization on the Web Belica, Michal January 2013 (has links) The work deals with automatic summarization of documents in HTML format. As a language of web documents, Czech language has been chosen. The project is focused on algorithms of text summarization. The work also includes document preprocessing for summarization and conversion of text into representation suitable for summarization algorithms. General text mining is also briefly discussed but the project is mainly focused on the automatic document summarization. Two simple summarization algorithms are introduced. Then, the main attention is paid to an advanced algorithm that uses latent semantic analysis. Result of the work is a design and implementation of summarization module for Python language. Final part of the work contains evaluation of summaries generated by implemented summarization methods and their subjective comparison of the author.

Search results