31 |
Användning av generativ AI inom digital innovation : En kvalitativ studie ur innovatörers perspektiv / The use of generative AI in digital innovation : A qualitative study through the lens of innovatorsSüvari, Andreas, Wallmark, Rebecca January 2023 (has links)
Påskyndat av teknik går utvecklingen snabbare än någonsin. Generativ AI har blivit tillgänglig för allmänheten. Det ger möjligheter för verksamheter att nyttja AI-teknik utan större insatser och kunskap. Detta skiftar förutsättningarna inom digital innovation. Denna nya aktör skapar gap i litteraturen, där tidigare forskning behöver omvärderas. Ett viktigt forskningsområde är hur användningen av generativ AI påverkar digital innovation. En annan aspekt är hur innovatörer kan nyttja, och förhålla sig till generativ AI inom innovationsprocessen. För att undersöka detta har en kvalitativ studie genomförts, där empiri har samlats in genom åtta intervjuer. Studien har resulterat i en tematisk modell med följande teman: Generativ AI som en kollega; Generativ AI som resurs för digital innovation; Generativ AI ökar tillgängligheten till AI-teknik; Känslor gällande generativ AI; Problematik gällande generativ AI; Spridd och differentierad syn på digital innovation. Studien visar att generativ AI kan påverka digital innovation genom de resulterande temana. Vidare relateras dessa teman till innovationsprocessen, där en modifierad processmodell för innovation har tagits fram. Då användningen av generativ AI är ett relativt nytt fenomen är det sannolikt att innovatörer framöver kommer att öka sin användning av verktyget, vilket medför att fynden från denna studie riskerar att snabbt bli utdaterade. Vidare forskning bör därför utföra liknande studier med jämna mellanrum, för att fånga upp nya erfarenheter som uppstår av den ökade användningen. / Accelerated by technology, development is progressing faster than ever. Generative AI has become accessible to the general public. It provides opportunities for businesses to leverage AI technology without significant efforts and expertise. This shifts the conditions within digital innovation. This new actor creates gaps in the literature, where previous research needs to be reevaluated. An important research area is how the use of generative AI affects digital innovation. Another aspect is how innovators can utilize and engage with generative AI in the innovation process. To investigate this, a qualitative study has been conducted, where empirical data has been collected through eight interviews. The study has resulted in a thematic model with the following themes: Generative AI as a colleague; Generative AI as resource for digital innovation; Generative AI increases accessibility to AI technology; Emotions regarding generative AI; Challenges regarding generative AI; Diverse and differentiated views on digital innovation. The study shows that generative AI can affect digital innovation through the resulting themes. Furthermore, these themes were related to the innovation process, where a modified process model for innovation has been developed. Since the use of generative AI is a relatively new phenomenon, it is likely that innovators will increase their use of the tool in the future. This may render the findings from this study quickly outdated. Further research should therefore conduct similar studies at regular intervals to capture new experiences arising from increased usage.
|
32 |
Adaptive Brain-Computer Interface Systems For Communication in People with Severe Neuromuscular DisabilitiesMainsah, Boyla O. January 2016 (has links)
<p>Brain-computer interfaces (BCI) have the potential to restore communication or control abilities in individuals with severe neuromuscular limitations, such as those with amyotrophic lateral sclerosis (ALS). The role of a BCI is to extract and decode relevant information that conveys a user's intent directly from brain electro-physiological signals and translate this information into executable commands to control external devices. However, the BCI decision-making process is error-prone due to noisy electro-physiological data, representing the classic problem of efficiently transmitting and receiving information via a noisy communication channel. </p><p>This research focuses on P300-based BCIs which rely predominantly on event-related potentials (ERP) that are elicited as a function of a user's uncertainty regarding stimulus events, in either an acoustic or a visual oddball recognition task. The P300-based BCI system enables users to communicate messages from a set of choices by selecting a target character or icon that conveys a desired intent or action. P300-based BCIs have been widely researched as a communication alternative, especially in individuals with ALS who represent a target BCI user population. For the P300-based BCI, repeated data measurements are required to enhance the low signal-to-noise ratio of the elicited ERPs embedded in electroencephalography (EEG) data, in order to improve the accuracy of the target character estimation process. As a result, BCIs have relatively slower speeds when compared to other commercial assistive communication devices, and this limits BCI adoption by their target user population. The goal of this research is to develop algorithms that take into account the physical limitations of the target BCI population to improve the efficiency of ERP-based spellers for real-world communication. </p><p>In this work, it is hypothesised that building adaptive capabilities into the BCI framework can potentially give the BCI system the flexibility to improve performance by adjusting system parameters in response to changing user inputs. The research in this work addresses three potential areas for improvement within the P300 speller framework: information optimisation, target character estimation and error correction. The visual interface and its operation control the method by which the ERPs are elicited through the presentation of stimulus events. The parameters of the stimulus presentation paradigm can be modified to modulate and enhance the elicited ERPs. A new stimulus presentation paradigm is developed in order to maximise the information content that is presented to the user by tuning stimulus paradigm parameters to positively affect performance. Internally, the BCI system determines the amount of data to collect and the method by which these data are processed to estimate the user's target character. Algorithms that exploit language information are developed to enhance the target character estimation process and to correct erroneous BCI selections. In addition, a new model-based method to predict BCI performance is developed, an approach which is independent of stimulus presentation paradigm and accounts for dynamic data collection. The studies presented in this work provide evidence that the proposed methods for incorporating adaptive strategies in the three areas have the potential to significantly improve BCI communication rates, and the proposed method for predicting BCI performance provides a reliable means to pre-assess BCI performance without extensive online testing.</p> / Dissertation
|
33 |
Automatic Subtitle Generation for Sound in VideosGuenebaut, Boris January 2009 (has links)
<p>The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.</p>
|
34 |
Language Modeling For Turkish Continuous Speech RecognitionSahin, Serkan 01 December 2003 (has links) (PDF)
This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabilities using stem and endings. Then, bigram probabilities are obtained using only the stems. Single pass recognition was performed by using bigram probabilities. As a second job, two pass recognition was performed. Firstly, previous bigram probabilities were used to create word lattices. Secondly, trigram probabilities were obtained from a larger text. Finally, one-best results were obtained by using word lattices and trigram probabilities. All work is done in Hidden Markov Model Toolkit (HTK) environment, except parsing and network transforming.
|
35 |
Automatic Subtitle Generation for Sound in VideosGuenebaut, Boris January 2009 (has links)
The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.
|
36 |
Approches numériques pour le filtrage de documents centrés sur une entité : un modèle diachronique et des méta critères / Entity centric document filtering using numerical approaches : a diachronical model and meta criteriaBouvier, Vincent 16 December 2015 (has links)
[...] Nos principales contributions peuvent être résumées en trois points :1. la proposition d’un système de classification de documents centrés sur les entités à l’aide d’un profil d’entité et de méta critères dans le contexte de filtrage de documents. Nous avons mis en place une approche qui est indépendante des entités et qui utilise les principes du transfert de connaissances. En effet, notre approche permet l’apprentissage à partir d’un ensemble de données annotées pour un pool d’entités tout en étant capables de catégoriser des documents concernant des entités pour lesquels aucune donnée annotée n’a été fournie ;2. la proposition d’un nouveau modèle de langue diachronique pour étendre la définition de profil d’entité afin de permettre la mise à jour de celui-ci. En effet, le suivi d’une entité nommée implique de pouvoir distinguer une information déjà connue d’une information nouvelle. Le modèle de langue diachronique permet la mise à jour automatique du profil d’entité tout en minimisant le bruit apporté ;3. la proposition d’une méthode pour découvrir la popularité d’une entité afin d’améliorer la cohérence d’un modèle de classification sur tous les aspects temporels liés à une entité. Pour détecter l’importance d’un document au regard d’une entité, nous proposons d’utiliser, entre autres, des indicateurs temporels qui peuvent varier d’une entité à l’autre. Nous proposons de regrouper les entités en fonction de leur popularité sur le Web à chaque instant pour tenter d’améliorer la cohérence des modèles et ainsi augmenter les performances des classificateurs.[...] / [...] Our main contributions are:1. We propose an entity centric classification system, which helps finding documents that are related to an entity based on its profile and a set of meta criteria. We propose to use the classification result to filter out unrelated documents. This approach is entity independent and uses transfer learning principles. We trained the classification system with a set of annotated concerning a set of entities and we categorized documents that concerns other entities;2. We introduce a diachronical language model, which extends our definition of entity profile in order to add to the capability of updating an entity profile. Tracking an entity implies to distinguish between a known piece of information from a new one. This new language model enables automatic update of entity profile while minimizing the noise;3. We develop a method to detect the entity popularity in order to enhance the coherence of a classification model concerning temporal aspects. In order to detect the importance of a document regarding an entity, we propose to use temporal sensors, which may vary from an entity to another. We cluster entities sharing the same amount of popularity on the Web at each time t to enhance the coherence of classification model and thus improve classifier performances.[...]
|
37 |
Využití neanotovaných dat pro trénování OCR / OCR Trained with Unanotated DataBuchal, Petr January 2021 (has links)
The creation of a high-quality optical character recognition system (OCR) requires a large amount of labeled data. Obtaining, or in other words creating, such a quantity of labeled data is a costly process. This thesis focuses on several methods which efficiently use unlabeled data for the training of an OCR neural network. The proposed methods fall into the category of self-training algorithms. The general approach of all proposed methods can be summarized as follows. Firstly, the seed model is trained on a limited amount of labeled data. Then, the seed model in combination with the language model is used for producing pseudo-labels for unlabeled data. Machine-labeled data are then combined with the training data used for the creation of the seed model and they are used again for the creation of the target model. The successfulness of individual methods is measured on the handwritten ICFHR 2014 Bentham dataset. Experiments were conducted on two datasets which represented different degrees of labeled data availability. The best model trained on the smaller dataset achieved 3.70 CER [%], which is a relative improvement of 42 % in comparison with the seed model, and the best model trained on the bigger dataset achieved 1.90 CER [%], which is a relative improvement of 26 % in comparison with the seed model. This thesis shows that the proposed methods can be efficiently used to improve the OCR error rate by means of unlabeled data.
|
38 |
Rozpoznávání řeči s pomocí nástroje Sphinx-4 / Speech recognition using Sphinx-4Kryške, Lukáš January 2014 (has links)
This diploma thesis is aimed to find an effective method for continuous speech recognition. To be more accurate, it uses speech-to-text recognition for a keyword spotting discipline. This solution is able to be applicable for phone calls analysis or for a similar application. Most of the diploma thesis describes and implements speech recognition framework Sphinx-4 which uses Hidden Markov models (HMM) to define a language acoustic models. It is explained how these models can be trained for a new language or for a new language dialect. Finally there is in detail described how to implement the keyword spotting in the Java language.
|
39 |
Object Classification using Language ModelsFrom, Gustav January 2022 (has links)
In today’s modern digital world more and more emails and messengers must be sent, processed and handled. The categorizing and classification of these text pieces can take an incredibly long time and will cost the company a lot of time and money. If the classification could be done automatically by a computer dependent on the content of the text/message it would result in a major yield for the Easit AB and its customers. In order to facilitate the task of text-classification Easit needs a solution that is made out of one language model and one classifier model. The language model will convert raw text to a vector that is representative of the text and the classifier will construe what predefined labels fit for the vector. The end goal is not to create the best solution. It is simply to create a general understanding about different language and classifier models and how to build a system that will be both fast and accurate. BERT were the primary language model during evaluation but doc2Vec and One-Hot encoding was also tested. The classifier consisted out of boundary condition models or dense neural networks that were all trained without knowledge about what language model that the text vectors came from. The validation accuracy which was presented for the IMDB-comment dataset with BERT resulted between 75% to 94%, mostly dependent on the language model and not on the classifier. The knowledge from the work resulted in a recommendation to Easit for an alternativebased system solution. / I dagens moderna digitala värld är det allt mer majl-ärenden och meddelanden som ska skickas och processeras. Kategorisering och klassificering av dessa kan ta otroligt lång tid och kostar företag tid samt pengar. Om klassifieringen kunde ske automatiskt beroende på text-innehållet skulle det innebära en stor vinst för Easit AB och deras kunder. För att underlätta arbetet med text-klassifiering behöver Easit en tvådelad lösning som består utav en språkmodell och en klassifierare. Språkmodellen som omvandlar text till en vektor som representerar texten och klassifieraren tolkar vilka fördefinerade ettiketter/märken som passar för vektorn. Målet är inte att skapa den bästa lösningen utan det är att skapa en generell kunskap för hur man kan utforma ett system som kan klassifiera texten på ett träffsäkert och effektivt sätt. Vid utvärdering av olika språkmodeller användes framförallt BERT-modeller men även doc2Vec och One-Hot testas också. Klassifieraren bestod utav gränsvillkors-modeller eller dense neurala nätverk som tränades helt utan vetskap om vilken språkmodell som skickat text-vektorerna. Träffsäkerheten som uppvisades vid validering för IMDB-kommentars datasetet med BERT blev mellan 75% till 94%, primärt beroende på språkmodellen. De neuralt nätverk passar bäst som klassifierare mest på grund av deras skalbarhet med flera ettiketter. Kunskapen från arbetet resulterade i en rekommendation till Easit om en alternativbaserad systemlösning.
|
40 |
Pre-training a knowledge enhanced model in biomedical domain for information extractionYan, Xi January 2022 (has links)
While recent years have seen a rise of research in knowledge graph enrichedpre-trained language models(PLM), few studies have tried to transfer the work to the biomedical domain. This thesis is a first attempt to pre-train a large-scalebiological knowledge enriched language model (KPLM). Under the frameworkof CoLAKE (T. Sun et al., 2020), a general-use KPLM in general field, this study is pre-trained on PubMed abstracts (a large scale medical text data) andBIKG (AstraZeneca’s biological knowledge graph). We firstly get abstracts from PubMed and their entity linking results. Following this is to connect the entities from abstracts to BIKG to form sub-graphs. Such sub-graphs and sentences from PubMed abstracts are then sent to model CoLAKE for pre-training. By training the model on three objectives (masking word nodes, masking entity nodes and masking relation nodes), this research aims to not only enhancing model’s capacity on modeling natural language but also infusing in-depth knowledge. Later the model is fine-tuned on name entity recognition (NER) and relation extraction tasks on three benchmark datasets (Chemprot (Kringelumet al., 2016), DrugProt (form Text mining drug-protein/gene interactions sharedtask) and DDI (Segura-Bedmar et al., 2013)). Empirical results show that the model outperform state-of-the-art models relation extraction task on DDI dataset, with F1 score of 91.2%. Also on Drugprot and chemprot, this model shows improvement over baseline - scibert model.
|
Page generated in 0.0513 seconds