• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 25
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 34
  • 34
  • 16
  • 14
  • 13
  • 12
  • 11
  • 9
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Extractive Multi-document Summarization of News Articles

Grant, Harald January 2019 (has links)
Publicly available data grows exponentially through web services and technological advancements. To comprehend large data-streams multi-document summarization (MDS) can be used. In this research, the area of multi-document summarization is investigated. Multiple systems for extractive multi-document summarization are implemented using modern techniques, in the form of the pre-trained BERT language model for word embeddings and sentence classification. This is combined with well proven techniques, in the form of the TextRank ranking algorithm, the Waterfall architecture and anti-redundancy filtering. The systems are evaluated on the DUC-2002, 2006 and 2007 datasets using the ROUGE metric. Where the results show that the BM25 sentence representation implemented in the TextRank model using the Waterfall architecture and an anti-redundancy technique outperforms the other implementations, providing competitive results with other state-of-the-art systems. A cohesive model is derived from the leading system and tried in a user study using a real-world application. The user study is conducted using a real-time news detection application with users from the news-domain. The study shows a clear favour for cohesive summaries in the case of extractive multi-document summarization. Where the cohesive summary is preferred in the majority of cases.
22

Charakterizace chodců ve videu / Pedestrian Attribute Analysis

Studená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
23

Dialogue systems based on pre-trained language models

Zeng, Yan 07 1900 (has links)
Les modèles de langue pré-entraînés ont montré leur efficacité dans beaucoup de tâches de traitement de la langue naturelle. Ces modèles peuvent capter des régularités générales d'une langue à partir d'un grand ensemble de textes, qui sont utiles dans la plupart des applications en traitement de langue naturelle. Dans ce mémoire, nous étudions les problèmes de dialogue, i.e. générer une réponse à un énoncé de l'utilisateur. Nous exploitons les modèles de langue pré-entraînés pour traiter différents aspects des systèmes de dialogue. Premièrement, les modèles de langue pré-entraînés sont entraînés and utilisés dans les systèmes de dialogue de différentes façons. Il n'est pas clair quelle façon est la plus appropriée. Pour le dialogue orienté-tâche, l’approche de l'état de l'art pour le suivi de l'état de dialogue (Dialogue State Tracking) utilise BERT comme encodeur et empile un autre réseau de neurones récurrent (RNN) sur les sorties de BERT comme décodeur. Dans ce cas, seul l'encodeur peut bénéficier des modèles de langue pré-entraînés. Dans la première partie de ce mémoire, nous proposons une méthode qui utilise un seul modèle BERT pour l'encodeur et le décodeur, permettant ainsi un ajustement de paramètres plus efficace. Notre méthode atteint une performance qui dépasse l'état de l'art. Pour la tâche de génération de réponses dans un chatbot, nous comparons 4 approches communément utilisées. Elles sont basées sur des modèles pré-entraînés et utilisent des objectifs et des mécanismes d'attention différents. En nous appuyant sur des expérimentations, nous observons l'impact de deux types de disparité qui sont largement ignorées dans la littérature: disparité entre pré-entraînement et peaufinage, et disparité entre peaufinage et génération de réponse. Nous montrons que l'impact de ces disparités devient évident quand le volume de données d’entraînement est limité. Afin de remédier à ce problème, nous proposons deux méthodes qui réduisent les disparités, permettant d'améliorer la performance. Deuxièmement, même si les méthodes basées sur des modèles pré-entraînés ont connu de grands succès en dialogue général, nous devons de plus en plus traiter le problème de dialogue conditionné, c'est-à-dire dialogue en relation à une certaine condition (qui peut désigner un personnage, un sujet, etc.). Des chercheurs se sont aussi intéressés aux systèmes de chatbot avec des habiletés de conversation multiples, i.e. chatbot capable de confronter différentes situations de dialogues conditionnés. Ainsi, dans la seconde partie de ce mémoire, nous étudions le problème de génération de dialogue conditionné. D'abord, nous proposons une méthode générale qui exploite non seulement des données de dialogues conditionnées, mais aussi des données non-dialogues (textes) conditionnées. Ces dernières sont beaucoup plus faciles à acquérir en pratique. Ceci nous permet d'atténuer le problème de rareté de données. Ensuite, nous proposons des méthodes qui utilisent le concept d'adaptateur proposé récemment dans la littérature. Un adaptateur permet de renforcer un système de dialogue général en lui donnant une habileté spécifique. Nous montrons que les adaptateurs peuvent encoder des habiletés de dialogue conditionné de façon stricte ou flexible, tout en utilisant seulement 6% plus de paramètres. Ce mémoire contient 4 travaux sur deux grands problèmes de dialogue: l'architecture inhérente du modèle de dialogue basé sur des modèles de langue pré-entraînés, et l'enrichissement d'un système de dialogue général pour avoir des habiletés spécifiques. Ces travaux non seulement nous permettent d'obtenir des performances dépassant de l'état de l'art, mais aussi soulignent l'importance de concevoir l'architecture du modèle pour bien correspondre à la tâche, plutôt que simplement augmenter le volume de données d'entraînement et la puissance de calcul brute. / Pre-trained language models (LMs) have shown to be effective in many NLP tasks. They can capture general language regularities from a large amount of texts, which are useful for most applications related to natural languages. In this thesis, we study the problems of dialogue, i.e. to generate a response to a user's utterance. We exploit pre-trained language models to deal with different aspects of dialogue systems. First, pre-trained language models have been trained and used in different ways in dialogue systems and it is unclear what is the best way to use pre-trained language models in dialogue. For task-oriented dialogue systems, the state-of-the-art framework for Dialogue State Tracking (DST) uses BERT as the encoder and stacks an RNN upon BERT outputs as the decoder. Pre-trained language models are only leveraged for the encoder. In the first part of the thesis, we investigate methods using a single BERT model for both the encoder and the decoder, allowing for more effective parameter updating. Our method achieves new state-of-the-art performance. For the task of response generation in generative chatbot systems, we further compare the 4 commonly used frameworks based on pre-trained LMs, which use different training objectives and attention mechanisms. Through extensive experiments, we observe the impact of two types of discrepancy: pretrain-finetune discrepancy and finetune-generation discrepancy (i.e. differences between pre-training and fine-tuning, and between fine-tuning and generation), which have not been paid attention to. We show that the impact of the discrepancies will surface when limited amount of training data is available. To alleviate the problem, we propose two methods to reduce discrepancies, yielding improved performance. Second, even though pre-training based methods have shown excellent performance in general dialogue generation, we are more and more faced with the problem of conditioned conversation, i.e. conversation in relation with some condition (persona, topic, etc.). Researchers are also interested in multi-skill chatbot systems, namely equipping a chatbot with abilities to confront different conditioned generation tasks. Therefore, in the second part of the thesis, we investigate the problem of conditioned dialogue generation. First, we propose a general method that leverages not only conditioned dialogue data, but also conditioned non-dialogue text data, which are much easier to collect, in order to alleviate the data scarcity issue of conditioned dialogue generation. Second, the concept of Adapter has been recently proposed, which adapts a general dialogue system to enhance some dialogue skill. We investigate the ways to learn a dialogue skill. We show that Adapter has enough capacity to model a dialogue skill for either loosely-conditioned or strictly-conditioned response generation, while using only 6% more parameters. This thesis contains 4 pieces of work relating to the two general problems in dialogue systems: the inherent architecture for dialogue systems based on pre-trained LMs, and enhancement of a general dialogue system for some specific skills. The studies not only propose new approaches that outperform the current state of the art, but also stress the importance of carefully designing the model architecture to fit the task, instead of simply increasing the amount of training data and the raw computation power.
24

Nivåbedömning i oktavband: Är det rimligt vid hörapparatanpassning? / Level evaluation in octave bands: Is it reasonable when fitting hearing aids?

Stolt, Petter, Wahlsten, Markus January 2023 (has links)
Bakgrund: Finjusteringar av hörapparatens förstärkning görs för att validera förstärkningen. Patientens förmåga att kategorisera ljudbilden ligger till grund för de justeringar som görs. Syfte: Att utvärdera en praktiknära metod för finjustering av hörapparater. Metod: Deltagarna (N = 18) fick lyssna på och bedöma ljudbilden för ett talmaterial med slumpade nivåmodifieringar i oktavbandet 4 kHz. Försöksledaren korrigerade ljudbilden utifrån deltagarnas nivåbedömning, till dess att deltagarna upplevde att ljudbilden var naturlig. Deltagarna fick efter halva undersökningen, som intervention, lyssna på en genomgång som förklarade och jämförde de olika ljudbilderna. Resultat: Deltagarnas nivåbedömningar ledde till korrigeringar i oktavbandet som var statistiskt signifikanta, men en normalisering av oktavbandet uppnåddes inte. Efter genomgången kunde fler nivåmodifikationer korrigeras med en statistiskt signifikant skillnad. Nivåmodifikationer som kan kategoriseras som metalliska/skarpa ledde oftare till en statistiskt signifikant korrigering, än nivåmodifikationer som kan kategoriseras som otydliga/dova. Slutsatser: Om finjusteringar av hörapparaterna görs, bör audionomen ha klart för sig att det kan behövas större nivåförändringar i större frekvensband, för att patienten ska ha möjlighet att uppleva en skillnad i ljudbilden i en klinisk miljö. / Background: Fine-tuning of the hearing aid amplification is done to validate the amplification. The patient's ability to describe the sound quality lays as a foundation for the fine-tuning. Aim: To evaluate a practice-oriented method for fine-tuning hearing aids. Methods: The participants (N = 18) listened to and evaluated the sound quality for a speech-material with randomized level modifications in the octave band 4 kHz. The sound quality was adjusted according to the participants' evaluation, until a normalized sound quality was perceived by the participants. Halfway through the examination the participants, as an intervention, listened to a briefing which explained and compared differences in the different sound qualities. Results: The participants level evaluation led to adjustment in the octave band that was statistically significant, but a normalization of the octave band could not be achieved. After the briefing a larger number of level modifications were adjusted with a statistical significance. Level modifications which were categorized as metallic/sharp more often led to a statistically significant adjustment compared to level modifications categorized as unclear/dull. Conclusions: If fine-tuning of hearing aids is done, the audiologist should be aware that bigger level adjustment in broad bands might be needed, for the patient to be able to notice a difference in sound quality in a clinical setting.
25

Fine-Tuning Pre-Trained Language Models for CEFR-Level and Keyword Conditioned Text Generation : A comparison between Google’s T5 and OpenAI’s GPT-2 / Finjustering av förtränade språkmodeller för CEFR-nivå och nyckelordsbetingad textgenerering : En jämförelse mellan Googles T5 och OpenAIs GPT-2

Roos, Quintus January 2022 (has links)
This thesis investigates the possibilities of conditionally generating English sentences based on keywords-framing content and different difficulty levels of vocabulary. It aims to contribute to the field of Conditional Text Generation (CTG), a type of Natural Language Generation (NLG), where the process of creating text is based on a set of conditions. These conditions include words, topics, content or perceived sentiments. Specifically, it compares the performances of two well-known model architectures: Sequence-toSequence (Seq2Seq) and Autoregressive (AR). These are applied to two different tasks, individual and combined. The Common European Framework of Reference (CEFR) is used to assess the vocabulary level of the texts. In the absence of openly available CEFR-labelled datasets, the author has developed a new methodology with the host company to generate suitable datasets. The generated texts are evaluated on accuracy of the vocabulary levels and readability using readily available formulas. The analysis combines four established readability metrics, and assesses classification accuracy. Both models show a high degree of accuracy when classifying texts into different CEFR-levels. However, the same models are weaker when generating sentences based on a desired CEFR-level. This study contributes empirical evidence suggesting that: (1) Seq2Seq models have a higher accuracy than AR models in generating English sentences based on a desired CEFR-level and keywords; (2) combining Multi-Task Learning (MTL) with instructiontuning is an effective way to fine-tune models on text-classification tasks; and (3) it is difficult to assess the quality of computer generated language using only readability metrics. / I den här studien undersöks möjligheterna att villkorligt generera engelska meningar på så-kallad “naturligt” språk, som baseras på nyckelord, innehåll och vokabulärnivå. Syftet är att bidra till området betingad textgenerering, en underkategori av naturlig textgenerering, vilket är en metod för att skapa text givet vissa ingångsvärden, till exempel ämne, innehåll eller uppfattning. I synnerhet jämförs prestandan hos två välkända modellarkitekturer: sekvenstill-sekvens (Seq2Seq) och autoregressiv (AR). Dessa tillämpas på två uppgifter, såväl individuellt som kombinerat. Den europeiska gemensamma referensramen (CEFR) används för att bedöma texternas vokabulärnivå. I och med avsaknaden av öppet tillgängliga CEFR-märkta dataset har författaren tillsammans med värdföretaget utvecklat en ny metod för att generera lämpliga dataset. De av modellerna genererade texterna utvärderas utifrån vokabulärnivå och läsbarhet samt hur väl de uppfyller den sökta CEFRnivån. Båda modellerna visade en hög träffsäkerhet när de klassificerar texter i olika CEFR-nivåer. Dock uppvisade samma modeller en sämre förmåga att generera meningar utifrån en önskad CEFR-nivå. Denna studie bidrar med empiriska bevis som tyder på: (1) att Seq2Seq-modeller har högre träffsäkerhet än AR-modeller när det gäller att generera engelska meningar utifrån en önskad CEFR-nivå och nyckelord; (2) att kombinera inlärning av multipla uppgifter med instruktionsjustering är ett effektivt sätt att finjustera modeller för textklassificering; (3) att man inte kan bedömma kvaliteten av datorgenererade meningar genom att endast använda läsbarhetsmått.
26

Stora språkmodeller för bedömning av applikationsrecensioner : Implementering och undersökning av stora språkmodeller för att sammanfatta, extrahera och analysera nyckelinformation från användarrecensioner / Large Language Models for application review data : Implementation survey of Large Language Models (LLM) to summarize, extract, and analyze key information from user reviews

von Reybekiel, Algot, Wennström, Emil January 2024 (has links)
Manuell granskning av användarrecensioner för att extrahera relevant informationkan vara en tidskrävande process. Denna rapport har undersökt om stora språkmodeller kan användas för att sammanfatta, extrahera och analysera nyckelinformation från recensioner, samt hur en sådan applikation kan konstrueras.  Det visade sig att olika modeller presterade olika bra beroende på mätvärden ochviktning mellan recall och precision. Vidare visade det sig att fine-tuning av språkmodeller som Llama 3 förbättrade prestationen vid klassifikation av användbara recensioner och ledde, enligt vissa mätvärden, till högre prestation än större språkmodeller som Chat-Bison. För engelskt översatta recensioner hade Llama 3:8b:Instruct, Chat-Bison samt den fine-tunade versionen av Llama 3:8b ett F4-makro-score på 0.89, 0.90 och 0.91 respektive. Ytterligare ett resultat är att de större modellerna Chat-Bison, Text-Bison och Gemini, presterade bättre i fallet för generering av sammanfattande texter, än de mindre modeller som testades vid inmatning av flertalet recensioner åt gången.  Generellt sett presterade språkmodellerna också bättre om recensioner först översattes till engelska innan bearbetning, snarare än då recensionerna var skrivna i originalspråk där de majoriteten av recensionerna var skrivna på svenska. En annan lärdom från förbearbetning av recensioner är att antal anrop till dessa språkmodeller kan minimeras genom att filtrera utifrån ordlängd och betyg.  Utöver språkmodeller visade resultaten att användningen av vektordatabaser och embeddings kan ge en större överblick över användbara recensioner genom vektordatabasers inbyggda förmåga att hitta semantiska likheter och samla liknande recensioner i kluster. / Manually reviewing user reviews to extract relevant information can be a time consuming process. This report investigates if large language models can be used to summarize, extract, and analyze key information from reviews, and how such anapplication can be constructed.  It was discovered that different models exhibit varying degrees of performance depending on the metrics and the weighting between recall and precision. Furthermore, fine-tuning of language models such as Llama 3 was found to improve performance in classifying useful reviews and, according to some metrics, led to higher performance than larger language models like Chat-bison. Specifically, for English translated reviews, Llama 3:8b:Instruct, Chat-bison, and Llama 3:8b fine-tuned had an F4 macro score 0.89, 0.90, 0.91 respectively. A further finding is that the larger models, Chat-Bison, Text-Bison, and Gemini performed better than the smaller models that was tested, when inputting multiple reviews at a time in the case of summary text generation.  In general, language models performed better if reviews were first translated into English before processing rather than when reviews were written in the original language where most reviews were written in Swedish. Additionally, another insight from the pre-processing phase, is that the number of API-calls to these language models can be minimized by filtering based on word length and rating. In addition to findings related to language models, the results also demonstrated that the use of vector databases and embeddings can provide a greater overview of reviews by leveraging the databases’ built-in ability to identify semantic similarities and cluster similar reviews together.
27

Automatic text summarization of French judicial data with pre-trained language models, evaluated by content and factuality metrics

Adler, Malo January 2024 (has links)
During an investigation carried out by a police officer or a gendarme, audition reports are written, the length of which can be up to several pages. The high-level goal of this thesis is to study various automatic and reliable text summarization methods to help with this time-consuming task. One challenge comes from the specific, French and judicial data that we wish to summarize; and another challenge comes from the need for reliable and factual models. First, this thesis focuses on automatic summarization evaluation, in terms of both content (how well the summary captures essential information of the source text) and factuality (to what extent the summary only includes information from or coherent with the source text). Factuality evaluation, in particular, is of crucial interest when using LLMs for judicial purposes, because of their hallucination risks. Notably, we propose a light variation of SelfCheckGPT, which has a stronger correlation with human judgment (0.743) than the wide-spread BARTScore (0.542), or our study dataset. Other paradigms, such as Question-Answering, are studied in this thesis, which however underperform compared to these. Then, extractive summarization methods are explored and compared, including one based on graphs via the TextRank algorithm, and one based on greedy optimization. The latter (overlap rate: 0.190, semantic similarity: 0.513) clearly outperforms the base TextRank (overlap rate: 0.172, semantic similarity: 0.506). An improvement of the TextRank with a threshold mechanism is also proposed, leading to a non-negligible improvement (overlap rate: 0.180, semantic similarity: 0.513). Finally, abstractive summarization, with pre-trained LLMs based on a Transformer architecture, is studied. In particular, several general-purpose and multilingual models (Llama-2, Mistral and Mixtral) were objectively compared on a summarization dataset of judicial procedures from the French police. Results show that the performances of these models are highly related to their size: Llama-2 7B struggles to adapt to uncommon data (overlap rate: 0.083, BARTScore: -3.099), while Llama-2 13B (overlap rate: 0.159, BARTScore: -2.718) and Llama-2 70B (overlap rate: 0.191, BARTScore: -2.479) have proven quite versatile and efficient. To improve the performances of the smallest models, empirical prompt-engineering and parameter-efficient fine-tuning are explored. Notably, our fine-tuned version of Mistral 7B reaches performances comparable to those of much larger models (overlap rate: 0.185, BARTScore: -2.060), without the need for empirical prompt-engineering, and with a linguistic style closer to what is expected. / Under en utredning som görs av en polis eller en gendarm skrivs förhörsprotokoll vars längd kan vara upp till flera sidor. Målet på hög nivå med denna rapport är att studera olika automatiska och tillförlitliga textsammanfattningsmetoder för att hjälpa till med denna tidskrävande uppgift. En utmaning kommer från de specifika franska och rättsliga uppgifter som vi vill sammanfatta; och en annan utmaning kommer från behovet av pålitliga, sakliga och uppfinningsfria modeller. För det första fokuserar denna rapport på automatisk sammanfattningsutvärdering, både vad gäller innehåll (hur väl sammanfattningen fångar väsentlig information i källtexten) och fakta (i vilken utsträckning sammanfattningen endast innehåller information från eller överensstämmer med källtexten). Faktautvärdering, i synnerhet, är av avgörande intresse när man använder LLM för rättsliga ändamål, på grund av deras hallucinationsrisker. Vi föreslår särskilt en lätt variant av SelfCheckGPT, som har en starkare korrelation med mänskligt omdöme (0,743) än den utbredda BARTScore (0,542), eller vår studiedatauppsättning. Andra paradigm, såsom Question-Answering, studeras i denna rapport, som dock underpresterar jämfört med dessa. Sedan utforskas och jämförs extraktiva sammanfattningsmetoder, inklusive en baserad på grafer via TextRank-algoritmen och en baserad på girig optimering. Den senare (överlappning: 0,190, semantisk likhet: 0,513) överträffar klart basen TextRank (överlappning: 0,172, semantisk likhet: 0,506). En förbättring av TextRank med en tröskelmekanism föreslås också, vilket leder till en icke försumbar förbättring (överlappning: 0,180, semantisk likhet: 0,513). Slutligen studeras abstrakt sammanfattning, med förutbildade LLM baserade på en transformatorarkitektur. I synnerhet jämfördes flera allmänna och flerspråkiga modeller (Llama-2, Mistral och Mixtral) objektivt på en sammanfattningsdatauppsättning av rättsliga förfaranden från den franska polisen. Resultaten visar att prestandan för dessa modeller är starkt relaterade till deras storlek: Llama-2 7B kämpar för att anpassa sig till ovanliga data (överlappning: 0,083, BARTScore: -3,099), medan Llama-2 13B (överlappning: 0,159, BARTScore: -2,718) och Llama-2 70B (överlappning: 0,191, BARTScore: -2,479) har visat sig vara ganska mångsidiga och effektiva. För att förbättra prestandan för de minsta modellerna utforskas empirisk prompt-teknik och parametereffektiv finjustering. Noterbart är att vår finjusterade version av Mistral 7B når prestanda som är jämförbara med de för mycket större modeller (överlappning: 0,185, BARTScore: -2,060), utan behov av empirisk prompt-teknik och med en språklig stil som ligger närmare vad som förväntas.
28

Direct Preference Optimization for Improved Technical WritingAssistance : A Study of How Language Models Can Support the Writing of Technical Documentation at Saab / En studie i hur språkmodeller kan stödja skrivandet av teknisk dokumentation på Saab

Bengtsson, Hannes, Habbe, Patrik January 2024 (has links)
This thesis explores the potential of Large Language Models (LLMs) to assist in the technical documentation process at Saab. With the increasing complexity and regulatory demands on such documentation, the objective is to investigate advanced natural language processing techniques as a means of streamlining the creation of technical documentation. Although many standards exist, this thesis particularly focuses on the standard ASD-STE100, Simplified Technical English abbrv. STE, a controlled language for technical documentation. STE's primary aim is to ensure that technical documents are understandable to individuals regardless of their native language or English proficiency.  The study focuses on the implementation of Direct Preference Optimization (DPO) and Supervised Instruction Fine-Tuning (SIFT) to refine the capabilities of LLMs in producing clear and concise outputs that comply with STE. Through a series of experiments, we investigate the effectiveness of LLMs in interpreting and simplifying technical language, with a particular emphasis on adherence to STE standards. The study utilizes a dataset comprised of target data paired with synthetic source data generated by a LLM. We apply various model training strategies, including zero-shot performance, supervised instruction fine-tuning, and direct preference optimization. We evaluate the various models' output using established quantitative metrics for text simplification and substitute human evaluators with company internal software for evaluating adherence to company standards and STE. Our findings suggest that while LLMs can significantly contribute to the technical writing process, the choice of training methods and the quality of data play crucial roles in the model's performance. This study shows how LLMs can improve productivity and reduce manual work. It also looks at the problems and suggests ways to make technical documentation automation better in the future.
29

Event-Cap – Event Ranking and Transformer-based Video Captioning / Event-Cap – Event rankning och transformerbaserad video captioning

Cederqvist, Gabriel, Gustafsson, Henrik January 2024 (has links)
In the field of video surveillance, vast amounts of data are gathered each day. To be able to identify what occurred during a recorded session, a human annotator has to go through the footage and annotate the different events. This is a tedious and expensive process that takes up a large amount of time. With the rise of machine learning and in particular deep learning, the field of both image and video captioning has seen large improvements. Contrastive Language-Image Pretraining is capable of efficiently learning a multimodal space, thus able to merge the understanding of text and images. This enables visual features to be extracted and processed into text describing the visual content. This thesis presents a system for extracting and ranking important events from surveillance videos as well as a way of automatically generating a description of the event. By utilizing the pre-trained models X-CLIP and GPT-2 to extract visual information from the videos and process it into text, a video captioning model was created that requires very little training. Additionally, the ranking system was implemented to extract important parts in video, utilizing anomaly detection as well as polynomial regression. Captions were evaluated using the metrics BLEU, METEOR, ROUGE and CIDEr, and the model receives scores comparable to other video captioning models. Additionally, captions were evaluated by experts in the field of video surveillance, who rated them on accuracy, reaching up to 62.9%, and semantic quality, reaching 99.2%. Furthermore the ranking system was also evaluated by the experts, where they agree with the ranking system 78% of the time. / Inom videoövervakning samlas stora mängder data in varje dag. För att kunna identifiera vad som händer i en inspelad övervakningsvideo så måste en människa gå igenom och annotera de olika händelserna. Detta är en långsam och dyr process som tar upp mycket tid. Under de senaste åren har det setts en enorm ökning av användandet av olika maskininlärningsmodeller. Djupinlärningsmodeller har fått stor framgång när det kommer till att generera korrekt och trovärdig text. De har också använts för att generera beskrivningar för både bilder och video. Contrastive Language-Image Pre-training har gjort det möjligt att träna en multimodal rymd som kombinerar förståelsen av text och bild. Detta gör det möjligt att extrahera visuell information och skapa textbeskrivningar. Denna master uppsatts beskriver ett system som kan extrahera och ranka viktiga händelser i en övervakningsvideo samt ett automatiskt sätt att generera beskrivningar till dessa. Genom att använda de förtränade modellerna X-CLIP och GPT-2 för att extrahera visuell information och textgenerering, har en videobeskrivningsmodell skapats som endast behöver en liten mängd träning. Dessutom har ett rankingsystem implementerats för att extrahera de viktiga delarna i en video genom att använda anomalidetektion och polynomregression. Video beskrivningarna utvärderades med måtten BLEU, METOER, ROUGE och CIDEr, där modellerna får resultat i klass med andra videobeskrivningsmodeller. Fortsättningsvis utvärderades beskrivningarna också av experter inom videoövervakningsområdet där de fick besvara hur bra beskrivningarna var i måtten: beskrivningsprecision som uppnådde 62.9% och semantisk kvalité som uppnådde 99.2%. Ranknignssystemet utvärderades också av experterna. Deras åsikter överensstämde till 78% med rankningssystemet.
30

Ανάπτυξη και χρήση υπολογιστικών μεθόδων για την σχετικιστική μελέτη των αστέρων νετρονίων / Development and use of calculating methods for the relativistic study of neutron stars

Σφαέλος, Ιωάννης 20 April 2011 (has links)
Βασικός άξονας της παρούσας διατριβής είναι οι σχετικιστικοί υπολογισμοί πολυτροπικών μοντέλων περιστρεϕόμενων αστέρων νετρονίων. Επειδή δεν υπάρχει ακριβής αναλυτική λύση των εξισώσεων του Einstein για το ϐαρυτικό πεδίο ενός περιστρεϕόμενου αστέρα νετρονίων, επιχειρούμε την αϱιθμητική επίλυση στο μιγαδικό επίπεδο όλων των διαϕορικών εξισώσεων, που εμπεριέχονται στην διαταρακτική μέθοδο του Hartle. Δίνουμε έμϕαση στον υπολογισμό φυσικών ποσοτήτων, που περιγράϕουν την γεωμετρία ταχέως περιστρεϕόμενων μοντέλων. Συγκρίνοντας τα αριθμητικά αποτελέσματα που ϐρίσκουμε με ορισμένες πολύπλοκες επαναληπτικές μεθόδους, ελέγχουμε την αξιόλογη ϐελτίωση των αποτελεσμάτων μας, έναντι εκείνων που δίνονται από το κλασσικό διαταρακτικό σχήμα του Hartle. Η παρούσα διατριβή χωρίζεται σε τέσσερα μέρη, που αναπτύσσονται στα κεϕάλαια 1, 2, 3 και 4. Στο πρώτο κεϕάλαιο, ϑα εστιάσουμε την προσοχή μας στο σύστημα διαφορικών εξισώσεων Oppenheimer − Volkov, που εξάγονται από τις εξισώσεις πεδίου του Einstein. Σε συνδυασμό με μια καταστατική εξίσωση περιγράφουμε σχετικιστικά πολυτροπικά μοντέλα μη περιστρεϕόμενων αστέρων νετρονίων σε υδροστατική ισορροπία. Ακολούθως, περιγράϕουμε ένα καθαϱά σχετικιστικό φαινόμενο, τον συρμό των αδρανειακών συστημάτων λόγω της περιστροϕής του αστέρα. Στην συνέχεια, χρησιμοποιούμε την μέθοδο διαταραχής του Hartle, σύμϕωνα με την οποία δεχόμαστε ότι ο στατικός αστέρας είναι το αδιατάρακτο σύστημα, πάνω στο οποίο εϕαρμόζουμε μικρές διαταραχές (ϑεωρώντας την ομοιόμορϕη περιστροϕή ως διαταραχή) και έτσι υπολογίζουμε τις διορθώσεις στην μάζα και την ακτίνα, λόγω των σϕαιρικών και τετραπολικών παραμορϕώσεων. Τέλος, εϕαρμόζουμε μία διαταρακτική προσέγγιση με όρους τρίτης τάξης στην γωνιακή ταχύτητα. Στο δεύτερο κεϕάλαιο, ϑα κάνουμε μια εκτενή περιγραϕή της στρατηγικής του μιγαδικού επιπέδου (Complex-Plane Strategy, εν συντομία CPS). Σύμϕωνα με αυτή την μέθοδο, η αριθμητική ολοκλήρωση των διαϕορικών εξισώσεων γίνεται στο μιγαδικό επίπεδο και όλες οι εμπλεκόμενες συναρτήσεις του προβλήματός μας είναι μιγαδικές, μιγαδικής μεταβλητής. Συνεπώς, για την αποϕυγή διαϕόρων ιδιομορϕιών ή και απροσδιόριστων μορϕών, που προκύπτουν από τις οριακές συνθήκες του προβλήματος, κυρίως στο κέντρο και στην επιϕάνεια του αστέρα, μας δίνεται η δυνατότητα να επιλέξουμε ένα κατάλληλο μιγαδικό μονοπάτι για την εκτέλεση πάνω σ΄ αυτό της αριθμητικής ολοκλήρωσης των διαϕορικών εξισώσεων. Επιπλέον, οι αριθμητικές ολοκληϱώσεις όλων των διαϕορικών εξισώσεων του προβλήματος συνεχίζονται πολύ πέραν της επιϕάνειας του αδιατάρακτου μοντέλου, με αποτέλεσμα η ακτίνα υπολογίζεται εύκολα ως η ϱίζα του πραγματικού μέρους της συνάρτησης της πυκνότητας (χωρίς να είμαστε αναγκασμένοι να εκτελέσουμε οποιεσδήποτε αριθμητικές προεκβολές, που είναι γνωστό ότι επιϕέρουν σημαντικά σϕάλματα). Στο τρίτο κεϕάλαιο, υπολογίζουμε σημαντικές φυσικές ποσότητες που αφορούν τον αστέρα νετρονίων, ολοκληρώνοντας αριθμητικά ένα σύστημα διαφορικών εξισώσεων πρώτης τάξης. Ιδιαίτερα, υπολογίζουμε το σύνορο της περιστρεϕόμενης αστρικής δομής με δύο τρόπους. Ο πρώτος είναι με ϐάση την κλασική διαπραγμάτευση της διαταρακτικής μεθόδου του Hartle και ο δεύτερος με τον αλγόριθμο λεπτής ϱύθμισης που αναπτύσσουμε με την ϐοήθεια του οποίου παίρνουμε αξιόλογα αριθμητικά αποτελέσματα. Στην συνέχεια περιγράϕουμε το λογισμικό πακέτο ATOMFT System. Ακολούθως, με την ϐοήθεια των λύσεων των διαϕορικών εξισώσεων τρίτης τάξης ως προς την γωνιακή ταχύτητα, υπολογίζουμε τις διορθώσεις στην στροϕορμή, την ϱοπή αδράνειας, την περιστροϕική κινητική ενέργεια και την ϐαρυτική δυναμική ενέργεια του αστέρα. Εϕαρμόζοντας τέλος μια κατάλληλη μέθοδο, υπολογίζουμε το όριο της μάζας διαϕυγής. Στο τέταρτο κεϕάλαιο, εκθέτουμε πίνακες αποτελεσμάτων και κάποιες σημαντικές γραϕικές παραστάσεις. Δίνουμε επίσης ορισμένες λεπτομέρειες της εϕαρμογής του προγράμματός μας. Επιπλέον, δίνουμε έμϕαση στο γνωστό «παράδοξο» που αϕορά την μέθοδο διαταραχών του Hartle,σύμϕωνα με την οποία αυτή η μέθοδος αν και αντιπροσωπεύει μια προσέγγιση αργής πεϱιστροϕής ενός αστέρα νετρονίων, δίνει αξιόλογα αποτελέσματα ακόμη και όταν εϕαρμόζεται σε ταχέως περιστρεϕόμενα μοντέλα. Στην παρούσα έρευνα αϕαιρέσαμε τον κρίσιμο περιορισμό του τερματισμού των αριθμητικών ολοκληρώσεων λίγο πριν από την επιϕάνεια του μη περιστρεϕόμενου αστέρα, συνεχίζοντας την ολοκλήρωση αρκετά πέραν του συνόρου του. Αυτό σημαίνει ότι η CPS ¨γνωρίζει¨ την παραμόρϕωση που προκαλείται από την περιστροϕή για ένα αρκετά εκτεταμένο διάστημα που περιβάλλει την αρχικά σϕαιρική μορϕή του αστέρα. Συνεπώς, για τους υπολογισμούς που απαιτούνται για τον περιστρεϕόμενο αστέρα, η CPS δεν προεκβάλλει ποτέ, με αποτέλεσμα τα σϕάλματα των υπολογισμών είναι πολύ μικρά. Τέλος, λαμβάνοντας υπόψη κατάλληλα στους υπολογισμούς μας ένα ορισμένο αριθμό συνθηκών, συνδυάζοντας την κλασική διαπραγμάτευση του διαταρακτικού σχήματος του Hartle και τις σχέσεις που απορρέουν από την δομή της στρατηγικής του μιγαδικού επιπέδου, οδηγηθήκαμε τελικά στην επινόηση του αλγόριθμου λεπτής ϱύθμισης, αποτέλεσμα του οποίου είναι η σημαντική ϐελτίωση της ακρίβειας των αριθμητικών αποτελεσμάτων που αϕορούν την γεωμετρία του συνόρου του αστέρα νετρονίων. ΄Αμεση συνέπεια όλων αυτών είναι ο υπολογισμός με ικανοποιητική ακρίβεια του ορίου της μάζας διαϕυγής, εϕαρμόζοντας μια κατάλληλη μέθοδο. / In the present dissertation we solve numerically in the complex plane all the differential equations involved in Hartle’s perturbation method for computing general-relativistic polytropic models of rotating neutron stars. We give emphasis on computing quantities describing the geometry of models in rapid rotation. Compared to numerical results obtained by certain sophisticated iterative methods, we verify appreciable improvement of our results vs to those given by the classical Hartle’s perturbative scheme. The description of the present investigation is constituted by four parts and has as follows. In the first chapter, we start to describe the nonrotating neutron star model. Then, according to "Hartle’s perturbation method", the solid rotation is added as a perturbation. So, the equations of structure for uniformly rotating stars are given up to second order in the angular velocity and the distortions to mass and radius are calculated as corrections owing to spherical and quadrupole deformations. Subsequently, the equations are given up to third order in the angular velocity. In the second chapter, we describe extensively the numerical method called Complex-Plane Strategy (abbreviated CPS). According to this method, we solve numerically in the complex plane all the differential equations involved in Hartle’s perturbation method. Any function of our problem is interpreted as a complex-valued function of a complex variable. CPS offers an alternative for avoiding any singularities and/or indeterminate forms, especially near the center and the surface of the nonrotating star, by performing numerical integration along a proper complex path. Moreover, the numerical integrations of all the differential equations governing the problem are continued well beyond the surface of the nonrotating star, thus, the radius is readily calculated as root of the density function (without been forced to perform any numerical extrapolations). In the third chapter, we solve numerically in the complex plane the system of first-order differential equations resulting from Hartle’s perturbation method. We give emphasis on computing the boundary of the rotating configuration by the so-called fine tuning algorithm which gives appreciably improved results. Then, we describe the software systems that we use in our investigation, with emphasis on the ATOMFT System. Finally, we compute the third order corrections in the uniform angular velocity for the angular momentum, moment of inertia, rotational kinetical energy and gravitational potential energy. Furthermore, we describe a method for computing the mass-shedding limit. In the fourth chapter, we present several numerical results and some significant graphical representations. We also give certain details of our program implementation. Concluding, we emphasize on the well-known "paradox" concerning Hartle’s perturbation method, according to which this method, although representing a slow-rotation approximation, gives remarkably accurate results even when applied to rapidly rotating models. In the present work, we have removed the certain critical limitations of terminating integrations below the radius of the star. Instead, the numerical integration of our problem continues well beyond the boundary of the star. This means that CPS knows the distortion to be caused by rotation over a sufficiently extended space surrounding the initially spherical configuration. So, to the computation of a particular rotating configuration, CPS never extrapolates beyond the end of the function tables computed by such extended numerical integrations. It is exactly the avoidance of any extrapolation which keeps the error in the computations appreciably small. Finally, we have properly taken into account certain conditions matching Hartle’s perturbative scheme and the relations arising in the framework of the Complex-Plane Strategy. This treatment has led to the fine tuning algorithm which, in turn, has improved appreciably the accuracy of our numerical results related to the geometry of the star’s boundary. Consequently, the mass-shedding limit can be calculated using a proper procedure which gives remarkably accurate results.

Page generated in 0.255 seconds