Global ETD Search

121	Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Silvestre Cerdà, Joan Albert 05 April 2016 (has links) [EN] In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions. / [ES] Durante estos últimos años, los repositorios multimedia on-line han experimentado un gran crecimiento que les ha hecho establecerse como fuentes fundamentales de conocimiento, especialmente en el área de la educación, donde se han creado grandes repositorios de vídeo charlas educativas para complementar e incluso reemplazar los métodos de enseñanza tradicionales. No obstante, la mayoría de estas charlas no están transcritas ni traducidas debido a la ausencia de soluciones de bajo coste que sean capaces de hacerlo garantizando una calidad mínima aceptable. Soluciones de este tipo son claramente necesarias para hacer que las vídeo charlas sean más accesibles para hablantes de otras lenguas o para personas con discapacidades auditivas. Además, dichas soluciones podrían facilitar la aplicación de funciones de búsqueda y de análisis tales como clasificación, recomendación o detección de plagios, así como el desarrollo de funcionalidades educativas avanzadas, como por ejemplo la generación de resúmenes automáticos de contenidos para ayudar al estudiante a tomar apuntes. Por este motivo, el principal objetivo de esta tesis es desarrollar una solución de bajo coste capaz de transcribir y traducir vídeo charlas con un nivel de calidad razonable. Más específicamente, abordamos la integración de técnicas estado del arte de Reconocimiento del Habla Automático y Traducción Automática en grandes repositorios de vídeo charlas educativas para la generación de subtítulos multilingües de alta calidad sin requerir intervención humana y con un reducido coste computacional. Además, también exploramos los beneficios potenciales que conllevaría la explotación de la información de la que disponemos a priori sobre estos repositorios, es decir, conocimientos específicos sobre las charlas tales como el locutor, la temática o las transparencias, para crear sistemas de transcripción y traducción especializados mediante técnicas de adaptación masiva. Las soluciones propuestas en esta tesis han sido testeadas en escenarios reales llevando a cabo nombrosas evaluaciones objetivas y subjetivas, obteniendo muy buenos resultados. El principal legado de esta tesis, The transLectures-UPV Platform, ha sido liberado públicamente como software de código abierto, y, en el momento de escribir estas líneas, está sirviendo transcripciones y traducciones automáticas para diversos miles de vídeo charlas educativas en nombrosas universidades e instituciones Españolas y Europeas. / [CA] Durant aquests darrers anys, els repositoris multimèdia on-line han experimentat un gran creixement que els ha fet consolidar-se com a fonts fonamentals de coneixement, especialment a l'àrea de l'educació, on s'han creat grans repositoris de vídeo xarrades educatives per tal de complementar o inclús reemplaçar els mètodes d'ensenyament tradicionals. No obstant això, la majoria d'aquestes xarrades no estan transcrites ni traduïdes degut a l'absència de solucions de baix cost capaces de fer-ho garantint una qualitat mínima acceptable. Solucions d'aquest tipus són clarament necessàries per a fer que les vídeo xarres siguen més accessibles per a parlants d'altres llengües o per a persones amb discapacitats auditives. A més, aquestes solucions podrien facilitar l'aplicació de funcions de cerca i d'anàlisi tals com classificació, recomanació o detecció de plagis, així com el desenvolupament de funcionalitats educatives avançades, com per exemple la generació de resums automàtics de continguts per ajudar a l'estudiant a prendre anotacions. Per aquest motiu, el principal objectiu d'aquesta tesi és desenvolupar una solució de baix cost capaç de transcriure i traduir vídeo xarrades amb un nivell de qualitat raonable. Més específicament, abordem la integració de tècniques estat de l'art de Reconeixement de la Parla Automàtic i Traducció Automàtica en grans repositoris de vídeo xarrades educatives per a la generació de subtítols multilingües d'alta qualitat sense requerir intervenció humana i amb un reduït cost computacional. A més, també explorem els beneficis potencials que comportaria l'explotació de la informació de la que disposem a priori sobre aquests repositoris, és a dir, coneixements específics sobre les xarrades tals com el locutor, la temàtica o les transparències, per a crear sistemes de transcripció i traducció especialitzats mitjançant tècniques d'adaptació massiva. Les solucions proposades en aquesta tesi han estat testejades en escenaris reals duent a terme nombroses avaluacions objectives i subjectives, obtenint molt bons resultats. El principal llegat d'aquesta tesi, The transLectures-UPV Platform, ha sigut alliberat públicament com a programari de codi obert, i, en el moment d'escriure aquestes línies, està servint transcripcions i traduccions automàtiques per a diversos milers de vídeo xarrades educatives en nombroses universitats i institucions Espanyoles i Europees. / Silvestre Cerdà, JA. (2016). Different Contributions to Cost-Effective Transcription and Translation of Video Lectures [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62194 Artificial Intelligence Machine Learning Pattern Recognition Language Technologies Natual Language Processing Audio Segmentation Automatic Speech Recognition Machine Translation Language Modelling Massive Adaptation Intelligent Interaction Education Technology Enhanced Learning Video Lectures Multilingualism Accessibility Recommender Systems LENGUAJES Y SISTEMAS INFORMATICOS
122	An investigation of factors responsible for the dropout rates at Gert Sibande FET College Masemola, Tebogo Percians Portia 06 1900 (has links) The study investigated the factors that are responsible for student dropout rates at Gert Sibande FET College. A random sampling method was used to select participants for this study. A quantitative approach was used in this study. Accordingly, data were collected using a questionnaire designed in a Likert scale format. The study was limited to students at Gert Sibande FET College’s two campuses, namely, Evander and Sibanesetfu. Subsequently, the findings revealed that socio-economic factors, institutional policies and funding strongly explain the prevalent dropout rates at these two campuses. It is recommended that, adoption of student centred funding model, cultivation of relationships between lecturers and students as well as restructuring learning schedules be factored in during policy development. The findings confirmed that these recommendations would help reverse continuous dropouts currently experienced at Gert Sibande FET College. / Educational Leadership and Management / M. Ed. (Education Management) College dropouts Student individual-related Institutional related factors Facilities-related factors Student financial-related factors Lectures-related factors Management-related factors Programme related factors Test and exams-related factors 378.16913096827 Gert Sibande FET College -- Students
123	Mein Traum von Bibliothek Bauer, Charlotte, Schneider, Ulrich Johannes 09 June 2011 (has links) (PDF) Unter dem Titel „Mein Traum von Bibliothek“ wird demnächst an der Universitätsbibliothek Leipzig eine Vortragsreihe für Bibliothekare starten. Thema ist der von uns allen erlebte rasante Wandel in der Mediennutzung durch digitale Technik. Die Aufgaben der Bibliothek ändern sich, die Tätigkeiten der Bibliothekare ändern sich, die Funktionen der Bibliotheksräume ändern sich. Das hat Konsequenzen für alle wissenschaftlichen Bibliotheken, ins besondere für ein komplexes System wie das der UB Leipzig mit derzeit noch 19 Zweigbibliotheken. Welche Räume hat eine Bibliothek, wenn sowohl die Informationen wie die Wege dahin digital formatiert sind? Welcher Service sollte geboten werden? Wissenschaftliche Bibliothek Medienkonsum Dienstleistung Konzeption Wissenschaftliche Bibliothek Bibliothekar Arbeitsfeld Universitätsbibliothek Leipzig Vortragsreihe academic library media consumption servic conception academic library librarian field of work University Library Leipzig series of lectures ddc:020 rvk:AN 67700 rvk:AN 76000 rvk:AN 65200 rvk:AN 80491 Wissenschaftliche Bibliothek Medienkonsum Dienstleistung Konzeption Wissenschaftliche Bibliothek Bibliothekar Vortragsreihe
124	An investigation of factors responsible for the dropout rates at Gert Sibande FET College Masemola, Tebogo Percians Portia 06 1900 (has links) The study investigated the factors that are responsible for student dropout rates at Gert Sibande FET College. A random sampling method was used to select participants for this study. A quantitative approach was used in this study. Accordingly, data were collected using a questionnaire designed in a Likert scale format. The study was limited to students at Gert Sibande FET College’s two campuses, namely, Evander and Sibanesetfu. Subsequently, the findings revealed that socio-economic factors, institutional policies and funding strongly explain the prevalent dropout rates at these two campuses. It is recommended that, adoption of student centred funding model, cultivation of relationships between lecturers and students as well as restructuring learning schedules be factored in during policy development. The findings confirmed that these recommendations would help reverse continuous dropouts currently experienced at Gert Sibande FET College. / Educational Leadership and Management / M. Ed. (Education Management) College dropouts Student individual-related Institutional related factors Facilities-related factors Student financial-related factors Lectures-related factors Management-related factors Programme related factors Test and exams-related factors 378.16913096827 Gert Sibande FET College -- Students
125	Population growth, the settlement process and economic progress : Adam Smith's theory of demo-economic development / Progrès et peuplement : la théorie démo-économique d’Adam Smith Lange, Jérôme 13 December 2017 (has links) La population - en son sens originel de processus de peuplement - est un sujet étonnamment absent de l'énorme volume d’études sur Adam Smith. Ce thème était au centre de la philosophie morale et de l'économie politique du 18e siècle, les deux domaines auxquels les contributions de Smith sont les plus connues. Son importance dans l’œuvre de Smith a été obscurcie au 20e siècle par une focalisation étroite sur les questions économiques dans la littérature secondaire. Pour une analyse intégrale de son œuvre, il est essentiel que la place centrale du peuplement soit révélée. Trois thèmes aujourd'hui considérés comme essentiels au projet de Smith sont ainsi intimement liés à la population : le lien entre division du travail et étendue du marché ; la théorie des quatre stades du progrès de la société ; et le lien entre développement rural et urbain, lui-même au centre du plaidoyer de Smith pour la liberté du commerce. Le marché est un concept aujourd'hui assimilé au fonctionnement du système économique capitaliste ; pour Smith, il décrivait la faculté de commercer, aux vecteurs essentiellement démographiques et géographiques. Le progrès de la société est à la fois cause et effet de la croissance de la population. En son sein se trouve l'interrelation symbiotique entre le développement rural et urbain que Smith appelait le «progrès naturel de l'opulence». Adopter l’optique smithienne plutôt que néo-malthusienne dans l'examen des dynamiques de population et de développement - y compris l'analyse de la transition démographique - conduit alors à une reconsidération fondamentale des interactions causales entre mortalité, fécondité, richesse et variables institutionnelles. / Population - in its original sense of the process of peopling - is a topic surprisingly absent from the huge volume of scholarship on Adam Smith. This topic was central to 18th century moral philosophy and political economy, the two fields Smith most famously contributed to. Its importance in Smith’s work was obscured in the 20th century by a narrow focus on economic matters in the secondary literature. For an undivided analysis of Smith’s oeuvre it is crucial that the central position of the peopling process be brought to light. Three topics that are today recognised as essential to Smith’s project are thus intimately connected to population: the relation between the division of labour and the extent of the market; the stadial theory of progress; and the link between the development of town and country, itself central to Smith’s advocacy of the freedom of trade. The market is a concept read today through an institutional lens linking it to the functioning of the capitalist economic system; Smith conceived of it as facility for trade, with essentially demographic and geographic vectors. The progress of society is both cause and effect of the growth of population. At its core is the symbiotic interrelationship between rural and urban development that Smith called the “natural progress of opulence”. In turn, looking at dynamics of population and development - including the analysis of the demographic transition - through a Smithian rather than a neo-Malthusian lens leads to a fundamental reconsideration of causal interactions between mortality, fertility, wealth and institutional variables. Adam Smith Population et développement Leçons de jurisprudence Causalité cumulative Liens ville-campagne Théorie de la transition démographique Adam Smith Population and development Conjectural history (four stages theory) Lectures on jurisprudence Cumulative causation Rural-urban linkages Demographic transition theory 330
126	Mein Traum von Bibliothek Bauer, Charlotte, Schneider, Ulrich Johannes 09 June 2011 (has links) Unter dem Titel „Mein Traum von Bibliothek“ wird demnächst an der Universitätsbibliothek Leipzig eine Vortragsreihe für Bibliothekare starten. Thema ist der von uns allen erlebte rasante Wandel in der Mediennutzung durch digitale Technik. Die Aufgaben der Bibliothek ändern sich, die Tätigkeiten der Bibliothekare ändern sich, die Funktionen der Bibliotheksräume ändern sich. Das hat Konsequenzen für alle wissenschaftlichen Bibliotheken, ins besondere für ein komplexes System wie das der UB Leipzig mit derzeit noch 19 Zweigbibliotheken. Welche Räume hat eine Bibliothek, wenn sowohl die Informationen wie die Wege dahin digital formatiert sind? Welcher Service sollte geboten werden? info:eu-repo/classification/ddc/020 ddc:020
127	Scalable Multimedia Learning: From local eLectures to global Opencast Ketterl, Markus 27 March 2014 (has links) Universities want to go where the learners are to share their rich scientific and intellectual knowledge beyond the walls of the academy and to expand the boundaries of the classroom. This desire has become a critical need, as the worldwide economy adjusts to globalization and the need for advanced education and training becomes ever more critical. Unfortunately, the work of creating, processing, distributing and using quality multimedia learning content is expensive and technically challenging. The work combines research results, lessons learned and usage findings in the presentation of a fully open source based scalable lecture capture solution, that is useful in the heterogenous computing landscape of today’s universities and learning institutes. Especially implemented user facing applications and components are being addressed, which enable lecturers, faculty and students to record, analyze and subsequently re-use the recorded multimedia learning material in multiple and attractive ways across devices and distribution platforms. adaptive multimedia data mining dynamic media objects eLectures e-learning feeds lecture recording learning portals multimedia micro learning m-learning human computer interaction rich internet applications user interfaces mobile development streams web 2.0 web technologies web lectures video based learning visual analytics open source opencast podcasting recommender systems social software social navigation ddc:000 ddc:500 ddc:600 ddc:620
128	„… ein Gemisch von Gehörtem und selbst Zugeseztem“ / Nachschriften der ‚Kosmos-Vorträge‘ Alexander von Humboldts: Dokumentation, Kontextualisierung und exemplarische Analysen Thomas, Christian 10 November 2023 (has links) Diese Dissertationsschrift ist angesiedelt im Bereich Digitaler Edition archivalischer Quellen, deren Erschließung und (computergestützter) Analyse. Im Zentrum stehen die sog. Kosmos-Vorträge, die Alexander von Humboldts 1827/28 in zwei Vortragszyklen in Berlin gehalten hat. Diese werden als gleichwertige, zweifache Publikationen in Humboldts Werkbiographie eingeordnet. In einem zentralen Kapitel (Kap. 7) geht es mir um eine editionstheoretische Fundierung der Edition von Vorlesungsnachschriften, zunächst allgemein und dann bezogen auf die Nachschriften der Kosmos-Vorträge. Zuvor wird das Forschungsfeld beleuchtet, da über die Rahmenbedingungen und Inhalte der beiden Vortragsreihen bislang nur wenig bekannt war. Humboldts Motivation zu diesen Vorträgen, deren Zusammenhang mit dem Kosmos (1845–62) und weiteren seiner Publikationen, sowie die jeweiligen organisatorischen Rahmenbedingungen werden untersucht. Inhaltlich sind die Kosmos-Vorträge bislang wenig erforscht worden, unter anderem weil die wichtigsten Quellen nicht rezipiert wurden. Dank der Digitalisierung des Humboldt-Nachlasses und vor allem durch die Digitale Edition der Nachschriften aus dem Hörerkreis sind die Voraussetzungen dafür mittlerweile sehr viel besser. Um die künftige Arbeit mit diesen Dokumenten zu unterstützen, dokumentiere und reflektiere ich in Kapitel 8 die praktische Umsetzung des Editionsmodells gemäß den Richtlinien der Text Encoding Initiative (TEI). Anschließend stelle ich die edierten Nachschriften aus beiden Vortragszyklen vor und zeige, wie sich mit den digitalen Volltexten arbeiten lässt. Dabei kommen quantitative Untersuchungen und Verfahren wie automatische Kollation bzw. Plagiatssuche, aber auch ‚traditionell hermeneutische‘ Methoden zum Einsatz. Schließlich geht es mir in meiner Arbeit darum, die Grundlage für die weitere Erforschung der beiden Vortragsreihen wesentlich zu verbessern und anhand einiger exemplarischer Analysen erste Schritte in diese Richtung zu unternehmen. / This dissertation is located in the field of digital editions of archival sources, their exploration and (computer-assisted) analysis. In terms of content, it deals with the so-called Kosmos-Lectures, which Alexander von Humboldt held in two distinct courses in Berlin in the winter of 1827/28. The two series are recognised as two distinct publications of equal value in Humboldt’s oeuvre. In a central chapter (chapter 7), I am concerned with an edition-theoretical foundation for the edition of attendee’s notebooks, first in general and then in relation to the transcripts of the Kosmos-Lectures. Before this, the research field of the long-neglected Kosmos-Lectures is illuminated, as little has been known about the framework conditions of the lecture series. Humboldt’s motivation for these lectures, their connection with the Kosmos (1845–62) and other of his publications, and the respective organisational framework of the courses are being examined. In terms of content, the Kosmos-Lectures have so far been little researched, partly because the most important sources have not been taken into consideration. The conditions for this are now much better thanks to the digitisation of the Humboldt legacy collection and, above all, the digital edition of the transcripts from the audience. To facilitate future work with these documents, I document and reflect the practical implementation of the edition model according to the Text Encoding Initiative (TEI) in chapter 8. In the following two chapters, I present the attendee’s notebooks from both courses, and show how to work with these digital full texts. Quantitative investigations and methods such as automatic collation or text re-use detection, but also ‘traditional hermeneutic’ approaches are used. Ultimately, my work aims to significantly improve the basis for research into the two lecture series, which has so far been lacking, and to take the first steps in this direction by means of some exemplary analyses. Alexander von Humboldt Kosmos-Vorträge Nachschriften Edition Korpus Annotation Text Encoding Initiative / TEI-XML Digital Humanities Digitale Edition Alexander von Humboldt Kosmos-Lectures Attendee's Notebooks Edition Corpus Annotation Text Encoding Initiative / TEI-XML Digital Humanities Digital Scholarly Edition GK 4953 AK 54545 EC 1200 ddc:830
129	Deep Neural Networks for Automatic Speech-To-Speech Translation of Open Educational Resources Pérez González de Martos, Alejandro Manuel 12 July 2022 (has links) [ES] En los últimos años, el aprendizaje profundo ha cambiado significativamente el panorama en diversas áreas del campo de la inteligencia artificial, entre las que se incluyen la visión por computador, el procesamiento del lenguaje natural, robótica o teoría de juegos. En particular, el sorprendente éxito del aprendizaje profundo en múltiples aplicaciones del campo del procesamiento del lenguaje natural tales como el reconocimiento automático del habla (ASR), la traducción automática (MT) o la síntesis de voz (TTS), ha supuesto una mejora drástica en la precisión de estos sistemas, extendiendo así su implantación a un mayor rango de aplicaciones en la vida real. En este momento, es evidente que las tecnologías de reconocimiento automático del habla y traducción automática pueden ser empleadas para producir, de forma efectiva, subtítulos multilingües de alta calidad de contenidos audiovisuales. Esto es particularmente cierto en el contexto de los vídeos educativos, donde las condiciones acústicas son normalmente favorables para los sistemas de ASR y el discurso está gramaticalmente bien formado. Sin embargo, en el caso de TTS, aunque los sistemas basados en redes neuronales han demostrado ser capaces de sintetizar voz de un realismo y calidad sin precedentes, todavía debe comprobarse si esta tecnología está lo suficientemente madura como para mejorar la accesibilidad y la participación en el aprendizaje en línea. Además, existen diversas tareas en el campo de la síntesis de voz que todavía suponen un reto, como la clonación de voz inter-lingüe, la síntesis incremental o la adaptación zero-shot a nuevos locutores. Esta tesis aborda la mejora de las prestaciones de los sistemas actuales de síntesis de voz basados en redes neuronales, así como la extensión de su aplicación en diversos escenarios, en el contexto de mejorar la accesibilidad en el aprendizaje en línea. En este sentido, este trabajo presta especial atención a la adaptación a nuevos locutores y a la clonación de voz inter-lingüe, ya que los textos a sintetizar se corresponden, en este caso, a traducciones de intervenciones originalmente en otro idioma. / [CA] Durant aquests darrers anys, l'aprenentatge profund ha canviat significativament el panorama en diverses àrees del camp de la intel·ligència artificial, entre les quals s'inclouen la visió per computador, el processament del llenguatge natural, robòtica o la teoria de jocs. En particular, el sorprenent èxit de l'aprenentatge profund en múltiples aplicacions del camp del processament del llenguatge natural, com ara el reconeixement automàtic de la parla (ASR), la traducció automàtica (MT) o la síntesi de veu (TTS), ha suposat una millora dràstica en la precisió i qualitat d'aquests sistemes, estenent així la seva implantació a un ventall més ampli a la vida real. En aquest moment, és evident que les tecnologies de reconeixement automàtic de la parla i traducció automàtica poden ser emprades per a produir, de forma efectiva, subtítols multilingües d'alta qualitat de continguts audiovisuals. Això és particularment cert en el context dels vídeos educatius, on les condicions acústiques són normalment favorables per als sistemes d'ASR i el discurs està gramaticalment ben format. No obstant això, al cas de TTS, encara que els sistemes basats en xarxes neuronals han demostrat ser capaços de sintetitzar veu d'un realisme i qualitat sense precedents, encara s'ha de comprovar si aquesta tecnologia és ja prou madura com per millorar l'accessibilitat i la participació en l'aprenentatge en línia. A més, hi ha diverses tasques al camp de la síntesi de veu que encara suposen un repte, com ara la clonació de veu inter-lingüe, la síntesi incremental o l'adaptació zero-shot a nous locutors. Aquesta tesi aborda la millora de les prestacions dels sistemes actuals de síntesi de veu basats en xarxes neuronals, així com l'extensió de la seva aplicació en diversos escenaris, en el context de millorar l'accessibilitat en l'aprenentatge en línia. En aquest sentit, aquest treball presta especial atenció a l'adaptació a nous locutors i a la clonació de veu interlingüe, ja que els textos a sintetitzar es corresponen, en aquest cas, a traduccions d'intervencions originalment en un altre idioma. / [EN] In recent years, deep learning has fundamentally changed the landscapes of a number of areas in artificial intelligence, including computer vision, natural language processing, robotics, and game theory. In particular, the striking success of deep learning in a large variety of natural language processing (NLP) applications, including automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS), has resulted in major accuracy improvements, thus widening the applicability of these technologies in real-life settings. At this point, it is clear that ASR and MT technologies can be utilized to produce cost-effective, high-quality multilingual subtitles of video contents of different kinds. This is particularly true in the case of transcription and translation of video lectures and other kinds of educational materials, in which the audio recording conditions are usually favorable for the ASR task, and there is a grammatically well-formed speech. However, although state-of-the-art neural approaches to TTS have shown to drastically improve the naturalness and quality of synthetic speech over conventional concatenative and parametric systems, it is still unclear whether this technology is already mature enough to improve accessibility and engagement in online learning, and particularly in the context of higher education. Furthermore, advanced topics in TTS such as cross-lingual voice cloning, incremental TTS or zero-shot speaker adaptation remain an open challenge in the field. This thesis is about enhancing the performance and widening the applicability of modern neural TTS technologies in real-life settings, both in offline and streaming conditions, in the context of improving accessibility and engagement in online learning. Thus, particular emphasis is placed on speaker adaptation and cross-lingual voice cloning, as the input text corresponds to a translated utterance in this context. / Pérez González De Martos, AM. (2022). Deep Neural Networks for Automatic Speech-To-Speech Translation of Open Educational Resources [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/184019 / Premios Extraordinarios de tesis doctorales Traducción automática (MT) Síntesis de voz (TTS) Texto a voz Traducción de voz a voz Aprendizaje profundo Aprendizaje automático Inteligencia artificial Procesamiento del lenguaje natural Videoconferencias Accesibilidad Speech synthesis Text-to-speech Speech-to-speech translation Deep learning Machine learning Artificial intelligence Natural language processing Technology enhanced learning Video lectures Accessibility LENGUAJES Y SISTEMAS INFORMATICOS
130	Streaming Automatic Speech Recognition with Hybrid Architectures and Deep Neural Network Models Jorge Cano, Javier 30 December 2022 (has links) Tesis por compendio / [ES] Durante la última década, los medios de comunicación han experimentado una revolución, alejándose de la televisión convencional hacia las plataformas de contenido bajo demanda. Además, esta revolución no ha cambiado solamente la manera en la que nos entretenemos, si no también la manera en la que aprendemos. En este sentido, las plataformas de contenido educativo bajo demanda también han proliferado para proporcionar recursos educativos de diversos tipos. Estas nuevas vías de distribución de contenido han llegado con nuevos requisitos para mejorar la accesibilidad, en particular las relacionadas con las dificultades de audición y las barreras lingüísticas. Aquí radica la oportunidad para el reconocimiento automático del habla (RAH) para cumplir estos requisitos, proporcionando subtitulado automático de alta calidad. Este subtitulado proporciona una base sólida para reducir esta brecha de accesibilidad, especialmente para contenido en directo o streaming. Estos sistemas de streaming deben trabajar bajo estrictas condiciones de tiempo real, proporcionando la subtitulación tan rápido como sea posible, trabajando con un contexto limitado. Sin embargo, esta limitación puede conllevar una degradación de la calidad cuando se compara con los sistemas para contenido en diferido u offline. Esta tesis propone un sistema de RAH en streaming con baja latencia, con una calidad similar a un sistema offline. Concretamente, este trabajo describe el camino seguido desde el sistema offline híbrido inicial hasta el eficiente sistema final de reconocimiento en streaming. El primer paso es la adaptación del sistema para efectuar una sola iteración de reconocimiento haciendo uso de modelos de lenguaje estado del arte basados en redes neuronales. En los sistemas basados en múltiples iteraciones estos modelos son relegados a una segunda (o posterior) iteración por su gran coste computacional. Tras adaptar el modelo de lenguaje, el modelo acústico basado en redes neuronales también tiene que adaptarse para trabajar con un contexto limitado. La integración y la adaptación de estos modelos es ampliamente descrita en esta tesis, evaluando el sistema RAH resultante, completamente adaptado para streaming, en conjuntos de datos académicos extensamente utilizados y desafiantes tareas basadas en contenidos audiovisuales reales. Como resultado, el sistema proporciona bajas tasas de error con un reducido tiempo de respuesta, comparables al sistema offline. / [CA] Durant l'última dècada, els mitjans de comunicació han experimentat una revolució, allunyant-se de la televisió convencional cap a les plataformes de contingut sota demanda. A més a més, aquesta revolució no ha canviat només la manera en la que ens entretenim, si no també la manera en la que aprenem. En aquest sentit, les plataformes de contingut educatiu sota demanda també han proliferat pera proporcionar recursos educatius de diversos tipus. Aquestes noves vies de distribució de contingut han arribat amb nous requisits per a millorar l'accessibilitat, en particular les relacionades amb les dificultats d'audició i les barreres lingüístiques. Aquí radica l'oportunitat per al reconeixement automàtic de la parla (RAH) per a complir aquests requisits, proporcionant subtitulat automàtic d'alta qualitat. Aquest subtitulat proporciona una base sòlida per a reduir aquesta bretxa d'accessibilitat, especialment per a contingut en directe o streaming. Aquests sistemes han de treballar sota estrictes condicions de temps real, proporcionant la subtitulació tan ràpid com sigui possible, treballant en un context limitat. Aquesta limitació, però, pot comportar una degradació de la qualitat quan es compara amb els sistemes per a contingut en diferit o offline. Aquesta tesi proposa un sistema de RAH en streaming amb baixa latència, amb una qualitat similar a un sistema offline. Concretament, aquest treball descriu el camí seguit des del sistema offline híbrid inicial fins l'eficient sistema final de reconeixement en streaming. El primer pas és l'adaptació del sistema per a efectuar una sola iteració de reconeixement fent servir els models de llenguatge de l'estat de l'art basat en xarxes neuronals. En els sistemes basats en múltiples iteracions aquests models son relegades a una segona (o posterior) iteració pel seu gran cost computacional. Un cop el model de llenguatge s'ha adaptat, el model acústic basat en xarxes neuronals també s'ha d'adaptar per a treballar amb un context limitat. La integració i l'adaptació d'aquests models és àmpliament descrita en aquesta tesi, avaluant el sistema RAH resultant, completament adaptat per streaming, en conjunts de dades acadèmiques àmpliament utilitzades i desafiants tasques basades en continguts audiovisuals reals. Com a resultat, el sistema proporciona baixes taxes d'error amb un reduït temps de resposta, comparables al sistema offline. / [EN] Over the last decade, the media have experienced a revolution, turning away from the conventional TV in favor of on-demand platforms. In addition, this media revolution not only changed the way entertainment is conceived but also how learning is conducted. Indeed, on-demand educational platforms have also proliferated and are now providing educational resources on diverse topics. These new ways to distribute content have come along with requirements to improve accessibility, particularly related to hearing difficulties and language barriers. Here is the opportunity for automatic speech recognition (ASR) to comply with these requirements by providing high-quality automatic captioning. Automatic captioning provides a sound basis for diminishing the accessibility gap, especially for live or streaming content. To this end, streaming ASR must work under strict real-time conditions, providing captions as fast as possible, and working with limited context. However, this limited context usually leads to a quality degradation as compared to the pre-recorded or offline content. This thesis is aimed at developing low-latency streaming ASR with a quality similar to offline ASR. More precisely, it describes the path followed from an initial hybrid offline system to an efficient streaming-adapted system. The first step is to perform a single recognition pass using a state-of-the-art neural network-based language model. In conventional multi-pass systems, this model is often deferred to the second or later pass due to its computational complexity. As with the language model, the neural-based acoustic model is also properly adapted to work with limited context. The adaptation and integration of these models is thoroughly described and assessed using fully-fledged streaming systems on well-known academic and challenging real-world benchmarks. In brief, it is shown that the proposed adaptation of the language and acoustic models allows the streaming-adapted system to reach the accuracy of the initial offline system with low latency. / Jorge Cano, J. (2022). Streaming Automatic Speech Recognition with Hybrid Architectures and Deep Neural Network Models [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/191001 / Compendio Reconocimiento automático del habla Aprendizaje profundo Aprendizaje automático Inteligencia artificial Procesamiento del lenguaje natural Videoconferencias Accesibilidad Deep learning Machine learning Artificial intelligence Natural language processing Technology enhanced learning Video lectures Accessibility Automatic speech recognition Streaming automatic speech recognition LENGUAJES Y SISTEMAS INFORMATICOS

Search results