• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

On Deep Multiscale Recurrent Neural Networks

Chung, Junyoung 04 1900 (has links)
No description available.
2

<b>Speech Forensics Using Machine Learning</b>

Kratika Bhagtani (20699921) 10 February 2025 (has links)
<p dir="ltr">High quality synthetic speech can now be generated and used maliciously. There is a need of speech forensic tools to detect synthetic speech. Besides detection, it is important to identify the synthesizer that was used for generating a given speech. This is known as synthetic speech attribution. Speech editing tools can be used to create partially synthetic speech in which only parts of speech are synthetic. Detecting these synthetic parts is known as synthetic speech localization.</p><p dir="ltr">We first propose a method for synthetic speech attribution known as the Patchout Spectrogram Attribution Transformer (PSAT). PSAT can distinguish unseen speech synthesis methods (<i>unknown </i>synthesizers) from the methods that were seen during its training (<i>known </i>synthesizers). It achieves more than 95% attribution accuracy. Second, we propose a method known as Fine-Grain Synthetic Speech Attribution Transformer (FGSSAT) that can assign different labels to different <i>unknown </i>synthesizers. Existing methods including PSAT cannot distinguish between different <i>unknown </i>synthesizers. FGSSAT improves on existing work by doing a fine-grain synthetic speech attribution analysis. Third, we propose Synthetic Speech Localization Convolutional Transformer (SSLCT) and achieve less than 10% Equal Error Rate (EER) for synthetic speech localization. Fourth, we demonstrate that existing methods do not perform well for recent diffusion-based synthesizers. We propose the Diffusion-Based Synthetic Speech Dataset (DiffSSD) consisting of about 200 hours of speech, including synthetic speech from 8 diffusion-based open-source and 2 commercial generators. We train speech forensic methods on this dataset and show its importance with respect to recent open-source and commercial generators.</p>

Page generated in 0.0824 seconds