Global ETD Search

1	Machine Learning Approaches for Speech Forensics Amit Kumar Singh Yadav (19984650) 31 October 2024 (has links) <p dir="ltr">Several incidents report misuse of synthetic speech for impersonation attacks, spreading misinformation, and supporting financial frauds. To counter such misuse, this dissertation focuses on developing methods for speech forensics. First, we present a method to detect compressed synthetic speech. The method uses comparatively 33 times less information from compressed bit stream than used by existing methods and achieve high performance. Second, we present a transformer neural network method that uses 2D spectral representation of speech signals to detect synthetic speech. The method shows high performance on detecting both compressed and uncompressed synthetic speech. Third, we present a method using an interpretable machine learning approach known as disentangled representation learning for synthetic speech detection. Fourth, we present a method for synthetic speech attribution. It identifies the source of a speech signal. If the speech is spoken by a human, we classify it as authentic/bona fide. If the speech signal is synthetic, we identify the generation method used to create it. We examine both closed-set and open-set attribution scenarios. In a closed-set scenario, we evaluate our approach only on the speech generation methods present in the training set. In an open-set scenario, we also evaluate on methods which are not present in the training set. Fifth, we propose a multi-domain method for synthetic speech localization. It processes multi-domain features obtained from a transformer using a ResNet-style MLP. We show that with relatively less number of parameters, the proposed method performs better than existing methods. Finally, we present a new direction of research in speech forensics <i>i.e.</i>, bias and fairness of synthetic speech detectors. By bias, we refer to an action in which a detector unfairly targets a specific demographic group of individuals and falsely labels their bona fide speech as synthetic. We show that existing synthetic speech detectors are gender, age and accent biased. They also have bias against bona fide speech from people with speech impairments such as stuttering. We propose a set of augmentations that simulate stuttering in speech. We show that synthetic speech detectors trained with proposed augmentation have less bias relative to detector trained without it.</p> Speech recognition Audio processing Computer vision Image processing Digital forensics Deep learning media forensic speech forensics Anti-spoofing Deepfake Detection speech processing and recognition fair machine learning Deep learning autoencoders Transformer Neural Network Spectrogram analysis Self-Supervised Learning Disentangled representation learning synthetic speech attribution synthetic speech detection Multimedia Forensics
2	<b>Speech Forensics Using Machine Learning</b> Kratika Bhagtani (20699921) 10 February 2025 (has links) <p dir="ltr">High quality synthetic speech can now be generated and used maliciously. There is a need of speech forensic tools to detect synthetic speech. Besides detection, it is important to identify the synthesizer that was used for generating a given speech. This is known as synthetic speech attribution. Speech editing tools can be used to create partially synthetic speech in which only parts of speech are synthetic. Detecting these synthetic parts is known as synthetic speech localization.</p><p dir="ltr">We first propose a method for synthetic speech attribution known as the Patchout Spectrogram Attribution Transformer (PSAT). PSAT can distinguish unseen speech synthesis methods (<i>unknown </i>synthesizers) from the methods that were seen during its training (<i>known </i>synthesizers). It achieves more than 95% attribution accuracy. Second, we propose a method known as Fine-Grain Synthetic Speech Attribution Transformer (FGSSAT) that can assign different labels to different <i>unknown </i>synthesizers. Existing methods including PSAT cannot distinguish between different <i>unknown </i>synthesizers. FGSSAT improves on existing work by doing a fine-grain synthetic speech attribution analysis. Third, we propose Synthetic Speech Localization Convolutional Transformer (SSLCT) and achieve less than 10% Equal Error Rate (EER) for synthetic speech localization. Fourth, we demonstrate that existing methods do not perform well for recent diffusion-based synthesizers. We propose the Diffusion-Based Synthetic Speech Dataset (DiffSSD) consisting of about 200 hours of speech, including synthetic speech from 8 diffusion-based open-source and 2 commercial generators. We train speech forensic methods on this dataset and show its importance with respect to recent open-source and commercial generators.</p> Speech recognition Audio processing Computer vision Image processing Multimodal analysis and synthesis Digital forensics Deep learning speech forensics media forensics Multimedia forensics Audio forensics Generative AI Detection Deepfake detection Spectrograms Anti-spoofing Deep learning methodologies Transformer network unsupervised methods Self-supervised features Synthetic Speech Generation Generative AI Synthetic Speech Attribution
3	Machine Learning for Speech Forensics and Hypersonic Vehicle Applications Emily R Bartusiak (6630773) 06 December 2022 (has links) <p>Synthesized speech may be used for nefarious purposes, such as fraud, spoofing, and misinformation campaigns. We present several speech forensics methods based on deep learning to protect against such attacks. First, we use a convolutional neural network (CNN) and transformers to detect synthesized speech. Then, we investigate closed set and open set speech synthesizer attribution. We use a transformer to attribute a speech signal to its source (i.e., to identify the speech synthesizer that created it). Additionally, we show that our approach separates different known and unknown speech synthesizers in its latent space, even though it has not seen any of the unknown speech synthesizers during training. Next, we explore machine learning for an objective in the aerospace domain.</p> <p><br></p> <p>Compared to conventional ballistic vehicles and cruise vehicles, hypersonic glide vehicles (HGVs) exhibit unprecedented abilities. They travel faster than Mach 5 and maneuver to evade defense systems and hinder prediction of their final destinations. We investigate machine learning for identifying different HGVs and a conic reentry vehicle (CRV) based on their aerodynamic state estimates. We also propose a HGV flight phase prediction method. Inspired by natural language processing (NLP), we model flight phases as “words” and HGV trajectories as “sentences.” Next, we learn a “grammar” from the HGV trajectories that describes their flight phase transition patterns. Given “words” from the initial part of a HGV trajectory and the “grammar”, we predict future “words” in the “sentence” (i.e., future HGV flight phases in the trajectory). We demonstrate that this approach successfully predicts future flight phases for HGV trajectories, especially in scenarios with limited training data. We also show that it can be used in a transfer learning scenario to predict flight phases of HGV trajectories that exhibit new maneuvers and behaviors never seen before during training.</p> Audio processing Computer vision Digital forensics Deep learning machine learning deep learning speech forensics media forensics convolutional neural network transformer convolutional transformer ensemble spectrogram analysis mel spectrogram analysis synthesized speech detection synthesized speech attribution closed set open set t-stochastic neighbor embedding latent space analysis hypersonics hypersonic glide vehicles vehicle classification flight phase prediction stochastic grammar k-nearest neighbors support vector machine probabilistic context-free grammar automatic distillation of structure generalized earley parser transfer learning

1

Page generated in 0.0739 seconds