1 |
PRIVACY PRESERVING AND EFFICIENT MACHINE LEARNING ALGORITHMSEfstathia Soufleri (19184887) 21 July 2024 (has links)
<p dir="ltr">Extensive data availability has catalyzed the expansion of deep learning. Such advancements include image classification, speech, and natural language processing. However, this data-driven progress is often hindered by privacy restrictions preventing the public release of specific datasets. For example, some vision datasets cannot be shared due to privacy regulations, particularly those containing images depicting visually sensitive or disturbing content. At the same time, it is imperative to deploy deep learning efficiently, specifically Deep Neural Networks (DNNs), which are the core of deep learning. In this dissertation, we focus on achieving efficiency by reducing the computational cost of DNNs in multiple ways.</p><p dir="ltr">This thesis first tackles the privacy concerns arising from deep learning. It introduces a novel methodology that synthesizes and releases synthetic data, instead of private data. Specifically, we propose Differentially Private Image Synthesis (DP-ImgSyn) for generating and releasing synthetic images used for image classification tasks. These synthetic images satisfy the following three properties: (1) they have DP guarantees, (2) they preserve the utility of private images, ensuring that models trained using synthetic images result in comparable accuracy to those trained on private data, and (3) they are visually dissimilar from private images. The DP-ImgSyn framework consists of the following steps: firstly, a teacher model is trained on private images using a DP training algorithm. Subsequently, public images are used for initializing synthetic images, which are optimized in order to be aligned with the private dataset. This optimization leverages the teacher network's batch normalization layer statistics (mean, standard deviation) to inject information from the private dataset into the synthetic images. Third, the synthetic images and their soft labels obtained from the teacher model are released and can be employed for neural network training in image classification tasks.</p><p dir="ltr">As a second direction, this thesis delves into achieving efficiency in deep learning. With neural networks widely deployed for tackling diverse and complex problems, the resulting models often become parameter-heavy, demanding substantial computational resources for deployment. To address this challenge, we focus on quantizing the weights and the activations of DNNs. In more detail, we propose a method for compressing neural networks through layer-wise mixed-precision quantization. Determining the optimal bit widths for each layer is a non-trivial task, given the fact that the search space is exponential. Thus, we employ a Multi-Layer Perceptron (MLP) trained to determine the suitable bit-width for each layer. The Kullback-Leibler (KL) divergence of softmax outputs between the quantized and full precision networks is the metric used to gauge quantization quality. We experimentally investigate the relationship between KL divergence and network size, noting that more aggressive quantization correlates with higher divergence and vice versa. The MLP is trained using the layer-wise bit widths as labels and their corresponding KL divergence as inputs. To generate the training set, pairs of layer-wise bit widths and their respective KL divergence values are obtained through Monte Carlo sampling of the search space. This approach aims to reduce the computational cost of DNN deployment, while maintaining high classification accuracy.</p><p dir="ltr">Additionally, we aim to enhance efficiency in machine learning by introducing a computationally efficient method for action recognition on compressed videos. Rather than decompressing videos for action recognition tasks, our approach performs action recognition directly on the compressed videos. This is achieved by leveraging the modalities within the compressed video format, specifically motion vectors, residuals, and intra-frames. To process each modality, we deploy three neural networks. Our observations indicate a hierarchy in convergence behavior: the network processing intra-frames tend to converge to a flatter minimum than the network processing residuals, which, in turn, converge to a flatter minimum than the motion vector network. This hierarchy motivates our strategy for knowledge transfer among modalities to achieve flatter minima, generally associated with better generalization. Based on this insight, we propose Progressive Knowledge Distillation (PKD), a technique that incrementally transfers knowledge across modalities. This method involves attaching early exits, known as Internal Classifiers (ICs), to the three networks. PKD begins by distilling knowledge from the motion vector network, then the residual network, and finally the intra-frame network, sequentially improving the accuracy of the ICs. Moreover, we introduce Weighted Inference with Scaled Ensemble (WISE), which combines outputs from the ICs using learned weights, thereby boosting accuracy during inference. The combination of PKD and WISE demonstrates significant improvements in efficiency and accuracy for action recognition on compressed videos.</p><p dir="ltr">In summary, this dissertation contributes to advancing privacy preserving and efficient machine learning algorithms. The proposed methodologies offer practical solutions for deploying machine learning systems in real-world scenarios by addressing data privacy and computational efficiency. Through innovative approaches to image synthesis, neural network compression, and action recognition, this work aims to foster the development of robust and scalable machine learning frameworks for diverse computer vision applications.</p>
|
2 |
Video extraction for fast content access to MPEG compressed videosJiang, Jianmin, Weng, Y. 09 June 2009 (has links)
No / As existing video processing technology is primarily
developed in the pixel domain yet digital video is stored in compressed
format, any application of those techniques to compressed
videos would require decompression. For discrete cosine transform
(DCT)-based MPEG compressed videos, the computing cost of
standard row-by-row and column-by-column inverse DCT (IDCT)
transforms for a block of 8 8 elements requires 4096 multiplications
and 4032 additions, although practical implementation only
requires 1024 multiplications and 896 additions. In this paper, we
propose a new algorithm to extract videos directly from MPEG
compressed domain (DCT domain) without full IDCT, which is
described in three extraction schemes: 1) video extraction in 2 2
blocks with four coefficients; 2) video extraction in 4 4 blocks
with four DCT coefficients; and 3) video extraction in 4 4 blocks
with nine DCT coefficients. The computing cost incurred only
requires 8 additions and no multiplication for the first scheme,
2 multiplication and 28 additions for the second scheme, and
47 additions (no multiplication) for the third scheme. Extensive
experiments were carried out, and the results reveal that: 1) the
extracted video maintains competitive quality in terms of visual
perception and inspection and 2) the extracted videos preserve the
content well in comparison with those fully decompressed ones
in terms of histogram measurement. As a result, the proposed
algorithm will provide useful tools in bridging the gap between
pixel domain and compressed domain to facilitate content analysis
with low latency and high efficiency such as those applications in
surveillance videos, interactive multimedia, and image processing.
|
3 |
Motion Based Event AnalysisBiswas, Sovan January 2014 (has links) (PDF)
Motion is an important cue in videos that captures the dynamics of moving objects. It helps in effective analysis of various event related tasks such as human action recognition, anomaly detection, tracking, crowd behavior analysis, traffic monitoring, etc. Generally, accurate motion information is computed using various optical flow estimation techniques. On the other hand, coarse motion information is readily available in the form of motion vectors in compressed videos. Utilizing these encoded motion vectors reduces the computational burden involved in flow estimation and enables rapid analysis of video streams. In this work, the focus is on analyzing motion patterns, retrieved from either motion vectors or optical flow, in order to do various event analysis tasks such as video classification, anomaly detection and crowd flow segmentation.
In the first section, we utilize the motion vectors from H.264 compressed videos, a compression standard widely used due to its high compression ratio, to address the following problems. i) Video classification: This work proposes an approach to classify videos based on human action by capturing spatio-temporal motion pattern of the actions using Histogram of Oriented Motion Vector (HOMV) ii) Crowd flow segmentation: In this work, we have addressed the problem of flow segmentation of the dominant motion patterns of the crowds. The proposed approach combines multi-scale super-pixel segmentation of the motion vectors to obtain the final flow segmentation. iii) Anomaly detection: This problem is addressed by local modeling of usual behavior by capturing features such as magnitude and orientation of each moving object. In all the above approaches, the focus was to reduce computations while retaining comparable accuracy to pixel domain processing.
In second section, we propose two approaches for anomaly detection using optical flow. The first approach uses spatio-temporal low level motion features and detects anomalies based on the reconstruction error of the sparse representation of the candidate feature over a dictionary of usual behavior features. The main contribution is in enhancing each local dictionary by applying an appropriate transformation on dictionaries of the neighboring regions. The other algorithm aims to improve the accuracy of anomaly localization through short local trajectories of super pixels belonging to moving objects. These trajectories capture both spatial as well as temporal information effectively. In contrast to compressed domain analysis, these pixel level approaches focus on improving the accuracy of detection with reasonable detection speed.
|
Page generated in 0.2525 seconds