Spelling suggestions: "subject:"summarized""
1 |
Ranking, Labeling, and Summarizing Short Text in Social MediaKhabiri, Elham 03 October 2013 (has links)
One of the key features driving the growth and success of the Social Web is large-scale participation through user-contributed content – often through short text in social media. Unlike traditional long-form documents – e.g., Web pages, blog posts – these short text resources are typically quite brief (on the order of 100s of characters), often of a personal nature (reflecting opinions and reactions of users), and being generated at an explosive rate. Coupled with this explosion of short text in social media is the need for new methods to organize, monitor, and distill relevant information from these large-scale social systems, even in the face of the inherent “messiness” of short text, considering the wide variability in quality, style, and substance of short text generated by a legion of Social Web participants.
Hence, this dissertation seeks to develop new algorithms and methods to ensure the continued growth of the Social Web by enhancing how users engage with short text in social media. Concretely, this dissertation takes a three-fold approach:
First, this dissertation develops a learning-based algorithm to automatically rank short text comments associated with a Social Web object (e.g., Web document, image, video) based on the expressed preferences of the community itself, so that low-quality short text may be filtered and user attention may be focused on highly-ranked short text.
Second, this dissertation organizes short text through labeling, via a graph- based framework for automatically assigning relevant labels to short text. In this way meaningful semantic descriptors may be assigned to short text for improved classification, browsing, and visualization.
Third, this dissertation presents a cluster-based summarization approach for extracting high-quality viewpoints expressed in a collection of short text, while maintaining diverse viewpoints. By summarizing short text, user attention may quickly assess the aggregate viewpoints expressed in a collection of short text, without the need to scan each of possibly thousands of short text items.
|
2 |
Procedures for identifying and modeling time-to-event data in the presence of non--proportionalityZhu, Lei 22 January 2016 (has links)
For both randomized clinical trials and prospective cohort studies, the Cox regression model is a powerful tool for evaluating the effect of a treatment or an explanatory variable on time-to-event outcome. This method assumes proportional hazards over time. Systematic approaches to efficiently evaluate non-proportionality and to model data in the presence of non-proportionality are investigated.
Six graphical methods are assessed to verify the proportional hazards assumption based on characteristics of the survival function, cumulative hazard, or the feature of residuals. Their performances are empirically evaluated with simulations by checking their ability to be consistent and sensitive in detecting proportionality or non-proportionality. Two-sample data are generated in three scenarios of proportional hazards and five types of alternatives (that is, non-proportionality). The usefulness of these graphical assessment methods depends on the event rate and type of non-proportionality. Three numerical (statistical testing) methods are compared via simulation studies to investigate the proportional hazards assumption. In evaluating data for proportionality versus non-proportionality, the goal is to test a non-zero slope in a regression of the variable or its residuals on a specific function of time, or a Kolmogorov-type supremum test. Our simulation results show that statistical test performance is affected by the number of events, event rate, and degree of divergence of non-proportionality for a given hazards scenario. Determining which test will be used in practice depends on the specific situation under investigation. Both graphical and numerical approaches have benefits and costs, but they are complementary to each other. Several approaches to model and summarize non-proportionality data are presented, including non-parametric measurements and testing, semi-parametric models, and a parametric approach. Some illustrative examples using simulated data and real data are also presented. In summary, we present a systemic approach using both graphical and numerical methods to identify non-proportionality, and to provide numerous modeling strategies when proportionality is violated in time-to-event data.
|
3 |
Task-specific summarization of networks: Optimization and LearningEkhtiari Amiri, Sorour 11 June 2019 (has links)
Networks (also known as graphs) are everywhere. People-contact networks, social networks, email communication networks, internet networks (among others) are examples of graphs in our daily life. The increasing size of these networks makes it harder to understand them. Instead, summarizing these graphs can reveal key patterns and also help in sensemaking as well as accelerating existing graph algorithms. Intuitively, different summarizes are desired for different purposes. For example, to stop viral infections, one may want to find an effective policy to immunize people in a people-contact network. In this case, a high-quality network summary should highlight roughly structurally important nodes. Others may want to detect communities in the same people-contact network, and hence, the summary should show cohesive groups of nodes. This implies that for each task, we should design a specific method to reveal related patterns. Despite the importance of task-specific summarization, there has not been much work in this area.
Hence, in this thesis, we design task-specific summarization frameworks for univariate and multivariate networks. We start with optimization-based approaches to summarize graphs for a particular task and finally propose general frameworks which automatically learn how to summarize for a given task and generalize it to similar networks.
1. Optimization-based approaches: Given a large network and a task, we propose summarization algorithms to highlight specific characteristics of the graph (i.e., structure, attributes, labels, dynamics) with respect to the task. We develop effective and efficient algorithms for various tasks such as content-aware influence maximization and time segmentation. In addition, we study many real-world networks and their summary graphs such as people-contact, news-blogs, etc. and visualize them to make sense of their characteristics given the input task.
2. Learning-based approaches: As our next step, we propose a unified framework which learns the process of summarization itself for a given task. First, we design a generalizable algorithm to learn to summarize graphs for a set of graph optimization problems. Next, we go further and add sparse human feedback to the learning process for the given optimization task.
To the best of our knowledge, we are the first to systematically bring the necessity of considering the given task to the forefront and emphasize the importance of learning-based approaches in network summarization. Our models and frameworks lead to meaningful discoveries. We also solve problems from various domains such as epidemiology, marketing, social media, cybersecurity, and interactive visualization. / Doctor of Philosophy / Networks (also known as graphs) are everywhere. People-contact networks, social networks, email communication networks, internet networks (among others) are examples of graphs in our daily life. The increasing size of these networks makes it harder to understand them. Instead, summarizing these graphs can reveal key information and also help in sensemaking as well as accelerating existing graph analysis methods. Intuitively, different summarizes are desired for different purposes. For example, to stop viral infections, one may want to find an effective policy to immunize people in a people-contact network. In this case, a high-quality network summary should highlight roughly important nodes. Others may want to detect friendship communities in the same people-contact network, and hence, the summary should show cohesive groups of nodes. This implies that for each task, we should design a specific method to reveal related patterns. Despite the importance of task-specific summarization, there has not been much work in this area.
Hence, in this thesis, we design task-specific summarization frameworks for various type of networks with different approaches. To the best of our knowledge, we are the first to systematically bring the necessity of considering the given task to the forefront and emphasize the importance of learning-based approaches in network summarization. Our models and frameworks lead to meaningful discoveries. We also solve problems from various domains such as epidemiology, marketing, social media, cybersecurity, and interactive visualization.
|
4 |
Stora språkmodeller för bedömning av applikationsrecensioner : Implementering och undersökning av stora språkmodeller för att sammanfatta, extrahera och analysera nyckelinformation från användarrecensioner / Large Language Models for application review data : Implementation survey of Large Language Models (LLM) to summarize, extract, and analyze key information from user reviewsvon Reybekiel, Algot, Wennström, Emil January 2024 (has links)
Manuell granskning av användarrecensioner för att extrahera relevant informationkan vara en tidskrävande process. Denna rapport har undersökt om stora språkmodeller kan användas för att sammanfatta, extrahera och analysera nyckelinformation från recensioner, samt hur en sådan applikation kan konstrueras. Det visade sig att olika modeller presterade olika bra beroende på mätvärden ochviktning mellan recall och precision. Vidare visade det sig att fine-tuning av språkmodeller som Llama 3 förbättrade prestationen vid klassifikation av användbara recensioner och ledde, enligt vissa mätvärden, till högre prestation än större språkmodeller som Chat-Bison. För engelskt översatta recensioner hade Llama 3:8b:Instruct, Chat-Bison samt den fine-tunade versionen av Llama 3:8b ett F4-makro-score på 0.89, 0.90 och 0.91 respektive. Ytterligare ett resultat är att de större modellerna Chat-Bison, Text-Bison och Gemini, presterade bättre i fallet för generering av sammanfattande texter, än de mindre modeller som testades vid inmatning av flertalet recensioner åt gången. Generellt sett presterade språkmodellerna också bättre om recensioner först översattes till engelska innan bearbetning, snarare än då recensionerna var skrivna i originalspråk där de majoriteten av recensionerna var skrivna på svenska. En annan lärdom från förbearbetning av recensioner är att antal anrop till dessa språkmodeller kan minimeras genom att filtrera utifrån ordlängd och betyg. Utöver språkmodeller visade resultaten att användningen av vektordatabaser och embeddings kan ge en större överblick över användbara recensioner genom vektordatabasers inbyggda förmåga att hitta semantiska likheter och samla liknande recensioner i kluster. / Manually reviewing user reviews to extract relevant information can be a time consuming process. This report investigates if large language models can be used to summarize, extract, and analyze key information from reviews, and how such anapplication can be constructed. It was discovered that different models exhibit varying degrees of performance depending on the metrics and the weighting between recall and precision. Furthermore, fine-tuning of language models such as Llama 3 was found to improve performance in classifying useful reviews and, according to some metrics, led to higher performance than larger language models like Chat-bison. Specifically, for English translated reviews, Llama 3:8b:Instruct, Chat-bison, and Llama 3:8b fine-tuned had an F4 macro score 0.89, 0.90, 0.91 respectively. A further finding is that the larger models, Chat-Bison, Text-Bison, and Gemini performed better than the smaller models that was tested, when inputting multiple reviews at a time in the case of summary text generation. In general, language models performed better if reviews were first translated into English before processing rather than when reviews were written in the original language where most reviews were written in Swedish. Additionally, another insight from the pre-processing phase, is that the number of API-calls to these language models can be minimized by filtering based on word length and rating. In addition to findings related to language models, the results also demonstrated that the use of vector databases and embeddings can provide a greater overview of reviews by leveraging the databases’ built-in ability to identify semantic similarities and cluster similar reviews together.
|
Page generated in 0.043 seconds