Global ETD Search

1	Defending Against Misuse of Synthetic Media: Characterizing Real-world Challenges and Building Robust Defenses Pu, Jiameng 07 October 2022 (has links) Recent advances in deep generative models have enabled the generation of realistic synthetic media or deepfakes, including synthetic images, videos, and text. However, synthetic media can be misused for malicious purposes and damage users' trust in online content. This dissertation aims to address several key challenges in defending against the misuse of synthetic media. Key contributions of this dissertation include the following: (1) Understanding challenges with the real-world applicability of existing synthetic media defenses. We curate synthetic videos and text from the wild, i.e., the Internet community, and assess the effectiveness of state-of-the-art defenses on synthetic content in the wild. In addition, we propose practical low-cost adversarial attacks, and systematically measure the adversarial robustness of existing defenses. Our findings reveal that most defenses show significant degradation in performance under real-world detection scenarios, which leads to the second thread of my work: (2) Building detection schemes with improved generalization performance and robustness for synthetic content. Most existing synthetic image detection schemes are highly content-specific, e.g., designed for only human faces, thus limiting their applicability. I propose an unsupervised content-agnostic detection scheme called NoiseScope, which does not require a priori access to synthetic images and is applicable to a wide variety of generative models, i.e., GANs. NoiseScope is also resilient against a range of countermeasures conducted by a knowledgeable attacker. For the text modality, our study reveals that state-of-the-art defenses that mine sequential patterns in the text using Transformer models are vulnerable to simple evasion schemes. We conduct further exploration towards enhancing the robustness of synthetic text detection by leveraging semantic features. / Doctor of Philosophy / Recent advances in deep generative models have enabled the generation of realistic synthetic media or deepfakes, including synthetic images, videos, and text. However, synthetic media can be misused for malicious purposes and damage users' trust in online content. This dissertation aims to address several key challenges in defending against the misuse of synthetic media. Key contributions of this dissertation include the following: (1) Understanding challenges with the real-world applicability of existing synthetic media defenses. We curate synthetic videos and text from the Internet community, and assess the effectiveness of state-of-the-art defenses on the collected datasets. In addition, we systematically measure the robustness of existing defenses by designing practical low-cost attacks, such as changing the configuration of generative models. Our findings reveal that most defenses show significant degradation in performance under real-world detection scenarios, which leads to the second thread of my work: (2) Building detection schemes with improved generalization performance and robustness for synthetic content. Many existing synthetic image detection schemes make decisions by looking for anomalous patterns in a specific type of high-level content, e.g., human faces, thus limiting their applicability. I propose a blind content-agnostic detection scheme called NoiseScope, which does not require synthetic images for training, and is applicable to a wide variety of generative models. For the text modality, our study reveals that state-of-the-art defenses that mine sequential patterns in the text using Transformer models are not robust against simple attacks. We conduct further exploration towards enhancing the robustness of synthetic text detection by leveraging semantic features. Deepfake Datasets Deepfake Detection Synthetic Media Generative Models
2	NoiseLearner: An Unsupervised, Content-agnostic Approach to Detect Deepfake Images Vives, Cristian 21 March 2022 (has links) Recent advancements in generative models have resulted in the improvement of hyper- realistic synthetic images or "deepfakes" at high resolutions, making them almost indistin- guishable from real images from cameras. While exciting, this technology introduces room for abuse. Deepfakes have already been misused to produce pornography, political propaganda, and misinformation. The ability to produce fully synthetic content that can cause such mis- information demands for robust deepfake detection frameworks. Most deepfake detection methods are trained in a supervised manner, and fail to generalize to deepfakes produced by newer and superior generative models. More importantly, such detection methods are usually focused on detecting deepfakes having a specific type of content, e.g., face deepfakes. How- ever, other types of deepfakes are starting to emerge, e.g., deepfakes of biomedical images, satellite imagery, people, and objects shown in different settings. Taking these challenges into account, we propose NoiseLearner, an unsupervised and content-agnostic deepfake im- age detection method. NoiseLearner aims to detect any deepfake image regardless of the generative model of origin or the content of the image. We perform a comprehensive evalu- ation by testing on multiple deepfake datasets composed of different generative models and different content groups, such as faces, satellite images, landscapes, and animals. Further- more, we include more recent state-of-the-art generative models in our evaluation, such as StyleGAN3 and probabilistic denoising diffusion models (DDPM). We observe that Noise- Learner performs well on multiple datasets, achieving 96% accuracy on both StyleGAN and StyleGAN2 datasets. / Master of Science / Images synthesized by artificial intelligence, commonly known as deepfakes, are starting to become indistinguishable from real images. While these technological advances are exciting with regards to what a computer can do, it is important to understand that such technol- ogy is currently being used with ill intent. Thus, identifying these images is becoming a growing necessity, especially as deepfake technology grows to perfectly mimic the nature of real images. Current deepfake detection approaches fail to detect deepfakes of other content, such as sattelite imagery or X-rays, and cannot generalize to deepfakes synthesized by new artificial intelligence. Taking these concerns into account, we propose NoiseLearner, a deep- fake detection method that can detect any deepfake regardless of the content and artificial intelligence model used to synthesize it. The key idea behind NoiseLearner is that it does not require any deepfakes to train. Instead, NoiseLearner learns the key features of real images and uses them to differentiate between deepfakes and real images – without ever looking at a single deepfake. Even with this strong constraint, NoiseLearner shows promise by detecting deepfakes of diverse contents and models used to generate them. We also explore different ways to improve NoiseLearner. deepfake deepfake detection generative models GAN artificial intelligence Machine learning security in Machine learning
3	Multimedia Forensics Using Metadata Ziyue Xiang (17989381) 21 February 2024 (has links) <p dir="ltr">The rapid development of machine learning techniques makes it possible to manipulate or synthesize video and audio information while introducing nearly indetectable artifacts. Most media forensics methods analyze the high-level data (e.g., pixels from videos, temporal signals from audios) decoded from compressed media data. Since media manipulation or synthesis methods usually aim to improve the quality of such high-level data directly, acquiring forensic evidence from these data has become increasingly challenging. In this work, we focus on media forensics techniques using the metadata in media formats, which includes container metadata and coding parameters in the encoded bitstream. Since many media manipulation and synthesis methods do not attempt to hide metadata traces, it is possible to use them for forensics tasks. First, we present a video forensics technique using metadata embedded in MP4/MOV video containers. Our proposed method achieved high performance in video manipulation detection, source device attribution, social media attribution, and manipulation tool identification on publicly available datasets. Second, we present a transformer neural network based MP3 audio forensics technique using low-level codec information. Our proposed method can localize multiple compressed segments in MP3 files. The localization accuracy of our proposed method is higher compared to other methods. Third, we present an H.264-based video device matching method. This method can determine if the two video sequences are captured by the same device even if the method has never encountered the device. Our proposed method achieved good performance in a three-fold cross validation scheme on a publicly available video forensics dataset containing 35 devices. Fourth, we present a Graph Neural Network (GNN) based approach for the analysis of MP4/MOV metadata trees. The proposed method is trained using Self-Supervised Learning (SSL), which increased the robustness of the proposed method and makes it capable of handling missing/unseen data. Fifth, we present an efficient approach to compute the spectrogram feature with MP3 compressed audio signals. The proposed approach decreases the complexity of speech feature computation by ~77.6% and saves ~37.87% of MP3 decoding time. The resulting spectrogram features lead to higher synthetic speech detection performance.</p> Audio processing Computer vision Image and video coding Image processing Pattern recognition Video processing Digital forensics Deep learning Deepfake detection Digital forensics Video forensics Audio forensics Video metadata Audio metadata H.264 MP3 MP4 Video manipulation detection Video compression Audio compression Decision tree Deep learning Dimensionality reduction Spectrogram Graph neural networks Neural networks Transformer neural networks
4	Machine Learning Approaches for Speech Forensics Amit Kumar Singh Yadav (19984650) 31 October 2024 (has links) <p dir="ltr">Several incidents report misuse of synthetic speech for impersonation attacks, spreading misinformation, and supporting financial frauds. To counter such misuse, this dissertation focuses on developing methods for speech forensics. First, we present a method to detect compressed synthetic speech. The method uses comparatively 33 times less information from compressed bit stream than used by existing methods and achieve high performance. Second, we present a transformer neural network method that uses 2D spectral representation of speech signals to detect synthetic speech. The method shows high performance on detecting both compressed and uncompressed synthetic speech. Third, we present a method using an interpretable machine learning approach known as disentangled representation learning for synthetic speech detection. Fourth, we present a method for synthetic speech attribution. It identifies the source of a speech signal. If the speech is spoken by a human, we classify it as authentic/bona fide. If the speech signal is synthetic, we identify the generation method used to create it. We examine both closed-set and open-set attribution scenarios. In a closed-set scenario, we evaluate our approach only on the speech generation methods present in the training set. In an open-set scenario, we also evaluate on methods which are not present in the training set. Fifth, we propose a multi-domain method for synthetic speech localization. It processes multi-domain features obtained from a transformer using a ResNet-style MLP. We show that with relatively less number of parameters, the proposed method performs better than existing methods. Finally, we present a new direction of research in speech forensics <i>i.e.</i>, bias and fairness of synthetic speech detectors. By bias, we refer to an action in which a detector unfairly targets a specific demographic group of individuals and falsely labels their bona fide speech as synthetic. We show that existing synthetic speech detectors are gender, age and accent biased. They also have bias against bona fide speech from people with speech impairments such as stuttering. We propose a set of augmentations that simulate stuttering in speech. We show that synthetic speech detectors trained with proposed augmentation have less bias relative to detector trained without it.</p> Speech recognition Audio processing Computer vision Image processing Digital forensics Deep learning media forensic speech forensics Anti-spoofing Deepfake Detection speech processing and recognition fair machine learning Deep learning autoencoders Transformer Neural Network Spectrogram analysis Self-Supervised Learning Disentangled representation learning synthetic speech attribution synthetic speech detection Multimedia Forensics
5	<b>Speech Forensics Using Machine Learning</b> Kratika Bhagtani (20699921) 10 February 2025 (has links) <p dir="ltr">High quality synthetic speech can now be generated and used maliciously. There is a need of speech forensic tools to detect synthetic speech. Besides detection, it is important to identify the synthesizer that was used for generating a given speech. This is known as synthetic speech attribution. Speech editing tools can be used to create partially synthetic speech in which only parts of speech are synthetic. Detecting these synthetic parts is known as synthetic speech localization.</p><p dir="ltr">We first propose a method for synthetic speech attribution known as the Patchout Spectrogram Attribution Transformer (PSAT). PSAT can distinguish unseen speech synthesis methods (<i>unknown </i>synthesizers) from the methods that were seen during its training (<i>known </i>synthesizers). It achieves more than 95% attribution accuracy. Second, we propose a method known as Fine-Grain Synthetic Speech Attribution Transformer (FGSSAT) that can assign different labels to different <i>unknown </i>synthesizers. Existing methods including PSAT cannot distinguish between different <i>unknown </i>synthesizers. FGSSAT improves on existing work by doing a fine-grain synthetic speech attribution analysis. Third, we propose Synthetic Speech Localization Convolutional Transformer (SSLCT) and achieve less than 10% Equal Error Rate (EER) for synthetic speech localization. Fourth, we demonstrate that existing methods do not perform well for recent diffusion-based synthesizers. We propose the Diffusion-Based Synthetic Speech Dataset (DiffSSD) consisting of about 200 hours of speech, including synthetic speech from 8 diffusion-based open-source and 2 commercial generators. We train speech forensic methods on this dataset and show its importance with respect to recent open-source and commercial generators.</p> Speech recognition Audio processing Computer vision Image processing Multimodal analysis and synthesis Digital forensics Deep learning speech forensics media forensics Multimedia forensics Audio forensics Generative AI Detection Deepfake detection Spectrograms Anti-spoofing Deep learning methodologies Transformer network unsupervised methods Self-supervised features Synthetic Speech Generation Generative AI Synthetic Speech Attribution

1

Page generated in 0.0946 seconds