• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • 1
  • Tagged with
  • 11
  • 11
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Machine Learning Approaches for Speech Forensics

Amit Kumar Singh Yadav (19984650) 31 October 2024 (has links)
<p dir="ltr">Several incidents report misuse of synthetic speech for impersonation attacks, spreading misinformation, and supporting financial frauds. To counter such misuse, this dissertation focuses on developing methods for speech forensics. First, we present a method to detect compressed synthetic speech. The method uses comparatively 33 times less information from compressed bit stream than used by existing methods and achieve high performance. Second, we present a transformer neural network method that uses 2D spectral representation of speech signals to detect synthetic speech. The method shows high performance on detecting both compressed and uncompressed synthetic speech. Third, we present a method using an interpretable machine learning approach known as disentangled representation learning for synthetic speech detection. Fourth, we present a method for synthetic speech attribution. It identifies the source of a speech signal. If the speech is spoken by a human, we classify it as authentic/bona fide. If the speech signal is synthetic, we identify the generation method used to create it. We examine both closed-set and open-set attribution scenarios. In a closed-set scenario, we evaluate our approach only on the speech generation methods present in the training set. In an open-set scenario, we also evaluate on methods which are not present in the training set. Fifth, we propose a multi-domain method for synthetic speech localization. It processes multi-domain features obtained from a transformer using a ResNet-style MLP. We show that with relatively less number of parameters, the proposed method performs better than existing methods. Finally, we present a new direction of research in speech forensics <i>i.e.</i>, bias and fairness of synthetic speech detectors. By bias, we refer to an action in which a detector unfairly targets a specific demographic group of individuals and falsely labels their bona fide speech as synthetic. We show that existing synthetic speech detectors are gender, age and accent biased. They also have bias against bona fide speech from people with speech impairments such as stuttering. We propose a set of augmentations that simulate stuttering in speech. We show that synthetic speech detectors trained with proposed augmentation have less bias relative to detector trained without it.</p>

Page generated in 0.0415 seconds