Global ETD Search

1	Facial-based Analysis Tools: Engagement Measurements and Forensics Applications Bonomi, Mattia 27 July 2020 (has links) The last advancements in technology leads to an easy acquisition and spreading of multi-dimensional multimedia content, e.g. videos, which in many cases depict human faces. From such videos, valuable information describing the intrinsic characteristic of the recorded user can be retrieved: the features extracted from the facial patch are relevant descriptors that allow for the measurement of subject's emotional status or the identification of synthetic characters. One of the emerging challenges is the development of contactless approaches based on face analysis aiming at measuring the emotional status of the subject without placing sensors that limit or bias his experience. This raises even more interest in the context of Quality of Experience (QoE) measurement, or the measurement of user emotional status when subjected to a multimedia content, since it allows for retrieving the overall acceptability of the content as perceived by the end user. Measuring the impact of a given content to the user can have many implications from both the content producer and the end-user perspectives. For this reason, we pursue the QoE assessment of a user watching multimedia stimuli, i.e. 3D-movies, through the analysis of his facial features acquired by means of contactless approaches. More specifically, the user's Heart Rate (HR) was retrieved by using computer vision techniques applied to the facial recording of the subject and then analysed in order to compute the level of engagement. We show that the proposed framework is effective for long video sequences, being robust to facial movements and illumination changes. We validate it on a dataset of 64 sequences where users observe 3D movies selected to induce variations in users' emotional status. From one hand understanding the interaction between the user's perception of the content and his cognitive-emotional aspects leads to many opportunities to content producers, which may influence people's emotional statuses according to needs that can be driven by political, social, or business interests. On the other hand, the end-user must be aware of the authenticity of the content being watched: advancements in computer renderings allowed for the spreading of fake subjects in videos. Because of this, as a second challenge we target the identification of CG characters in videos by applying two different approaches. We firstly exploit the idea that fake characters do not present any pulse rate signal, while humans' pulse rate is expressed by a sinusoidal signal. The application of computer vision techniques on a facial video allows for the contactless estimation of the subject's HR, thus leading to the identification of signals that lack of a strong sinusoidality, which represent virtual humans. The proposed pipeline allows for a fully automated discrimination, validated on a dataset consisting of 104 videos. Secondly, we make use of facial spatio-temporal texture dynamics that reveal the artefacts introduced by computer renderings techniques when creating a manipulation, e.g. face swapping, on videos depicting human faces. To do so, we consider multiple temporal video segments on which we estimated multi-dimensional (spatial and temporal) texture features. A binary decision of the joint analysis of such features is applied to strengthen the classification accuracy. This is achieved through the use of Local Derivative Patterns on Three Orthogonal Planes (LDP-TOP). Experimental analyses on state-of-the-art datasets of manipulated videos show the discriminative power of such descriptors in separating real and manipulated sequences and identifying the creation method used. The main finding of this thesis is the relevance of facial features in describing intrinsic characteristics of humans. These can be used to retrieve significant information like the physiological response to multimedia stimuli or the authenticity of the human being itself. The application of the proposed approaches also on benchmark dataset returned good results, thus demonstrating real advancements in this research field. In addition to that, these methods can be extended to different practical application, from the autonomous driving safety checks to the identification of spoofing attacks, from the medical check-ups when doing sports to the users' engagement measurement when watching advertising. Because of this, we encourage further investigations in such direction, in order to improve the robustness of the methods, thus allowing for the application to increasingly challenging scenarios.
2	Multimedia Forensics Using Metadata Ziyue Xiang (17989381) 21 February 2024 (has links) <p dir="ltr">The rapid development of machine learning techniques makes it possible to manipulate or synthesize video and audio information while introducing nearly indetectable artifacts. Most media forensics methods analyze the high-level data (e.g., pixels from videos, temporal signals from audios) decoded from compressed media data. Since media manipulation or synthesis methods usually aim to improve the quality of such high-level data directly, acquiring forensic evidence from these data has become increasingly challenging. In this work, we focus on media forensics techniques using the metadata in media formats, which includes container metadata and coding parameters in the encoded bitstream. Since many media manipulation and synthesis methods do not attempt to hide metadata traces, it is possible to use them for forensics tasks. First, we present a video forensics technique using metadata embedded in MP4/MOV video containers. Our proposed method achieved high performance in video manipulation detection, source device attribution, social media attribution, and manipulation tool identification on publicly available datasets. Second, we present a transformer neural network based MP3 audio forensics technique using low-level codec information. Our proposed method can localize multiple compressed segments in MP3 files. The localization accuracy of our proposed method is higher compared to other methods. Third, we present an H.264-based video device matching method. This method can determine if the two video sequences are captured by the same device even if the method has never encountered the device. Our proposed method achieved good performance in a three-fold cross validation scheme on a publicly available video forensics dataset containing 35 devices. Fourth, we present a Graph Neural Network (GNN) based approach for the analysis of MP4/MOV metadata trees. The proposed method is trained using Self-Supervised Learning (SSL), which increased the robustness of the proposed method and makes it capable of handling missing/unseen data. Fifth, we present an efficient approach to compute the spectrogram feature with MP3 compressed audio signals. The proposed approach decreases the complexity of speech feature computation by ~77.6% and saves ~37.87% of MP3 decoding time. The resulting spectrogram features lead to higher synthetic speech detection performance.</p> Audio processing Computer vision Image and video coding Image processing Pattern recognition Video processing Digital forensics Deep learning Deepfake detection Digital forensics Video forensics Audio forensics Video metadata Audio metadata H.264 MP3 MP4 Video manipulation detection Video compression Audio compression Decision tree Deep learning Dimensionality reduction Spectrogram Graph neural networks Neural networks Transformer neural networks

Search results

Facial-based Analysis Tools: Engagement Measurements and Forensics Applications

Multimedia Forensics Using Metadata