• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 64
  • 7
  • 6
  • 5
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 99
  • 99
  • 37
  • 28
  • 27
  • 26
  • 24
  • 24
  • 23
  • 23
  • 22
  • 21
  • 20
  • 15
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Automatic Transcript Generator for Podcast Files

Holst, Andy January 2010 (has links)
In the modern world, Internet has become a popular place, people with speech hearing disabilities and search engines can't take part of speech content in podcast les. In order to solve the problem partially, the Sphinx decoders such as Sphinx-3, Sphinx-4 can be used to implement a Auto Transcript Generator application, by coupling already existing large acoustic model, language model and a existing dictionary, or by training your own large acoustic model, language model and creating your own dictionary to support continuous speaker independent speech recognition system.
12

Traitements linguistiques pour la reconnaissance automatique de la parole appliquée à la langue arabe : de l'arabe standard vers l'arabe dialectal

Boujelbane Jarraya, Rahma 05 December 2015 (has links)
Les différents dialectes de la langue arabe (DA) présentent de grandes variations phonologiques, morphologiques, lexicales et syntaxiques par rapport à la langue Arabe Standard Moderne (MSA). Jusqu’à récemment, ces dialectes n’étaient présents que sous leurs formes orales et la plupart des ressources existantes pour la langue arabe se limite à l’Arabe Standard (MSA), conduisant à une abondance d’outils pour le traitement automatique de cette variété. Étant donné les différences significatives entre le MSA et les DA, les performances de ces outils s’écroulent lors du traitement des DA. Cette situation conduit à une augmentation notable de l’ambiguïté dans les approches computationnelles des DA. Les travaux décrits dans cette thèse s’inscrivent dans ce cadre à travers la modélisation de l’oral parlé dans les médias tunisiens. Cette source de données contient une quantité importante d’Alternance Codique (AC) entre la langue normative MSA et le dialecte parlé en Tunisie (DT). La présence de ce dernier d’une manière désordonnée dans le discours pose une sérieuse problématique pour le Traitement Automatique de Langue et fait de cet oral une langue peu dotée. Toutefois, les ressources nécessaires pour modéliser cet oral sont quasiment inexistantes. Ainsi, l’objectif de cette thèse consiste à pallier ce manque afin de construire un modèle de langage dédié à un système de reconnaissance automatique pour l’oral parlé dans les médias tunisiens. Pour ce fait, nous décrivons dans cette thèse une méthodologie de création de ressources et nous l’évaluons par rapport à une tâche de modélisation de langage. Les résultats obtenu sont encourageants. / The different dialects of the arabic language have a large phonological, morphological, lexical and syntactic variations when compared to the standard written arabic language called MSA (Modern Standard Arabic). Until recently, these dialects were presented only in their oral form and most of the existing resources for the Arabic language is limited to the Standard Arabic (MSA), leading to an abundance of tools for the automatic processing of this variety. Given the significant differences between the MSA and DA, the performance of these tools fall down when processing AD. This situation leads to a significant increase of the ambiguity in computational approaches of AD.This thesis is part of this framework by modeling the oral spoken in the Tunisian media. This data source contains a significant amount of Code Switching (CS) between the normative language MSA and the Dialect spoken in Tunisia (DT). The presence of the latter in a disorderly manner in the discourse poses a serious problem for NLP (Natural Language Processing) and makes this oral a less resourced language. However, the resources required to model this oral are almost nonexistent. Thus, the objective of this thesis is to fill this gap in order to build a language model dedicated to an automatic recognition system for the oral spoken in the Tunisian media. For this reason, we describe in this thesis a resource generation methodologyand we evaluate it relative to a language modeling task. The results obtained are encouraging.
13

Connectivity Gradient in the Human Left Inferior Frontal Gyrus: Intraoperative Cortico-Cortical Evoked Potential Study / ヒト左下前頭回における結合性勾配について―術中皮質刺激皮質誘発電位による研究

Nakae, Takuro 27 July 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(医学) / 甲第22693号 / 医博第4637号 / 新制||医||1045(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授 高橋 淳, 教授 林 康紀, 教授 伊佐 正 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM
14

A retrieval-based chatbot ́s opinion on the trolley problem

Björklin, Hampus, Abrahamsson, Tim, Widenfalk, Oscar January 2021 (has links)
The goal of this project was to create a chatbot capable of debating a user using limited resources including a discussion thread from the online debate forum Kialo. A retrieval based bot was designed and the discussion thread was converted into a database which the bot could interpret and choose an appropriate answer from. Which answer is appropriate is decided by the bot using a few key features in a given input sentence. The main features are word similarity, sentiment distance and BERT-encoding (a model for vector representation of text created by Google). The similarity of these features where then used to score claims from the dataset. Combining and weighting the scores was then used to find the correct response to a given input sentence. The most successful of the features was BERT-encoding. Once the bot had been refined it was brought online and tested using the communication platform Discord.
15

Vícevrstvé aplikace v prostředí .NET / Multilayer applications in .NET environment

Palkech, Marek January 2009 (has links)
This thesis represents the Model-View-Controller pattern. It is focused especially at the description of the particular architecture layers principle and its functionality. It deals with the reasons of the three-layer architecture invention and it also deals with the advantages and the disadvantages provided by this pattern. The most frequent implementation of the MVC – the access to the data stored in the database through the web user interface is also described in this chapter. The next part of the thesis is concentrated on .NET Framework platform created from very voluminous, language-neutral library that is basically huge collection of the source codes providing the solution for common programmer’s problems and from the interface used for running up the applications created in .NET environment. The goal of the chapter concering with .NET Framework is to describe its architecture. The thesis also describes the platform invention, various versions of the .NET, the data access technology ADO.NET and the ASP.NET member ObjectDataSource. The chapter describing languages supported by the .NET Framework is focused on the C# language and its versions. The application “Multilayer applications in .NET environment” is the practical implementation of the mentioned technologies and it is described in the last chapter. The application’s architecture with the concentration on the particular Model-View-Controller layers implementation in the form of Microsoft Visual Studio 2005 projects is also described in the thesis. Special attention is paid to each operation over the data stored in the database tables that the application enables the user to execute, as for example data inserting, updating, selecting or deleting. The common business object’s child generation process is also described deep into the details.
16

Adaptace rozpoznávače řeči na datech bez přepisu / Unsupervised Adaptation of Speech Recognizer

Švec, Ján January 2015 (has links)
The goal of this thesis is to design and test techniques for unsupervised adaptation of speech recognizers on some audio data without any textual transcripts. A training set is prepared at first, and a baseline speech recognition system is trained. This sistem is used to transcribe some unseen data. We will experiment with an adaptation data selection process based on some speech transcript quality measurement. The system is re-trained on this new set than, and the accuracy is evaluated. Then we experiment with the amount of adaptation data.
17

EXPLORATORY SEARCH USING VECTOR MODEL AND LINKED DATA

Daeun Yim (9143660) 30 July 2020 (has links)
The way people acquire knowledge has largely shifted from print to web resources. Meanwhile, search has become the main medium to access information. Amongst various search behaviors, exploratory search represents a learning process that involves complex cognitive activities and knowledge acquisition. Research on exploratory search studies on how to make search systems help people seek information and develop intellectual skills. This research focuses on information retrieval and aims to build an exploratory search system that shows higher clustering performance and diversified search results. In this study, a new language model that integrates the state-of-the-art vector language model (i.e., BERT) with human knowledge is built to better understand and organize search results. The clustering performance of the new model (i.e., RDF+BERT) was similar to the original model but slight improvement was observed with conversational texts compared to the pre-trained language model and an exploratory search baseline. With the addition of the enrichment phase of expanding search results to related documents, the novel system also can display more diverse search results.
18

Cyberbullying Detection on social platforms using LargeLanguage Models

Ottosson, Dan January 2023 (has links)
Social media and platforms utilise moderation to removeunwanted content such as cyberbullying, an aggressive acttowards an individual or group that occurs over any type ofdigital technology, e.g. social platforms. However,moderating platforms manually is nearly impossible, and thedemand for automatic moderation is rising. Research ontechnical solutions for cyberbullying detection on socialplatforms is scarce and is mostly focused on MachineLearning models to detect cyberbullying without theconnection to platform moderation. This study aims toenhance the research on cyberbullying detection models byusing a GPT-3 Large Language model and reduce the gap toplatform moderation. The model is tweaked and tested todetect cyberbullying using popular cyberbullying datasetsand compared to previous Machine Learning- and LargeLanguage models using common performance metrics.Furthermore, the latency of the model is measured to test if itcan be used as an auto-moderation tool to detectcyberbullying on social platforms. The results show that themodel is on par with the previous models and that finetuning a Large Language model is the preferred way totweak the model in cyberbullying detection. Further, theresults show that Large Language models have higherlatency than Machine Learning models but can be improvedby using multiple threads and can be used as a platformmoderation tool to detect cyberbullying.
19

Probability of Belonging to a Language

Cook, Kevin Michael Brooks 16 April 2013 (has links) (PDF)
Conventional language models estimate the probability that a word sequence within a chosen language will occur. By contrast, the purpose of our work is to estimate the probability that the word sequence belongs to the chosen language. The language of interest in our research is comprehensible well-formed English. We explain how conventional language models assume what we refer to as a degree of generalization, the extent to which a model generalizes from a given sequence. We explain why such an assumption may hinder estimation of the probability that a sequence belongs. We show that the probability that a word sequence belongs to a chosen language (represented by a given sequence) can be estimated by avoiding an assumed degree of generalization, and we introduce two methods for doing so: Minimal Number of Segments (MINS) and Segment Selection. We demonstrate that in some cases both MINS and Segment Selection perform better at distinguishing sequences that belong from those that do not than any other method we tested, including Good-Turing, interpolated modified Kneser-Ney, and the Sequence Memoizer.
20

Discovering and Using Implicit Data for Information Retrieval

Yi, Xing 01 September 2011 (has links)
In real-world information retrieval (IR) tasks, the searched items and/or the users' queries often have implicit information associated with them -- information that describes unspecified aspects of the items or queries. For example, in web search tasks, web pages are often pointed to by hyperlinks (known as anchors) from other pages, and thus have human-generated succinct descriptions of their content (anchor text) associated with them. This indirectly available information has been shown to improve search effectiveness for different retrieval tasks. However, in many real-world IR challenges this information is sparse in the data; i.e., it is incomplete or missing in a large portion of the data. In this work, we explore how to discover and use implicit information in large amounts of data in the context of IR. We present a general perspective for discovering implicit information and demonstrate how to use the discovered data in four specific IR challenges: (1) finding relevant records in semi-structured databases where many records contain incomplete or empty fields; (2) searching web pages that have little or no associated anchor text; (3) using click-through records in web query logs to help search pages that have no or very few clicks; and (4) discovering plausible geographic locations for web queries that contain no explicit geographic information. The intuition behind our approach is that data similar in some aspects are often similar in other aspects. Thus we can (a) use the observed information of queries/documents to find similar queries/documents, and then (b) utilize those similar queries/documents to reconstruct plausible implicit information for the original queries/documents. We develop language modeling based techniques to effectively use content similarity among data for our work. Using the four different search tasks on large-scale noisy datasets, we empirically demonstrate the effectiveness of our approach. We further discuss the advantages and weaknesses of two complementary approaches within our general perspective of handling implicit information for retrieval purpose. Taken together, we describe a general perspective that uses contextual similarity among data to discover implicit information for IR challenges. Using this general perspective, we formally present two language modeling based information discovery approaches. We empirically evaluate our approaches using different IR challenges. Our research shows that supporting information discovery tailored to different search tasks can enhance IR systems' search performance and improve users' search experience.

Page generated in 0.0719 seconds