Global ETD Search

201	Human computer interface based on hand gesture recognition Bernard, Arnaud Jean Marc 24 August 2010 (has links) With the improvement of multimedia technologies such as broadband-enabled HDTV, video on demand and internet TV, the computer and the TV are merging to become a single device. Moreover the previously cited technologies as well as DVD or Blu-ray can provide menu navigation and interactive content. The growing interest in video conferencing led to the integration of the webcam in different devices such as laptop, cell phones and even the TV set. Our approach is to directly use an embedded webcam to remotely control a TV set using hand gestures. Using specific gestures, a user is able to control the TV. A dedicated interface can then be used to select a TV channel, adjust volume or browse videos from an online streaming server. This approach leads to several challenges. The first is the use of a simple webcam which leads to a vision based system. From the single webcam, we need to recognize the hand and identify its gesture or trajectory. A TV set is usually installed in a living room which implies constraints such as a potentially moving background and luminance change. These issues will be further discussed as well as the methods developed to resolve them. Video browsing is one example of the use of gesture recognition. To illustrate another application, we developed a simple game controlled by hand gestures. The emergence of 3D TVs is allowing the development of 3D video conferencing. Therefore we also consider the use of a stereo camera to recognize hand gesture. Video game Skin color segmentation H.264 motion vectors Human computer interface Hand gesture recognition Template matching Haar cascade Computer vision Video processing Television remote control Human-computer interaction User-centered system design User interfaces (Computer systems)
202	Traitement des signaux et images en temps réel : "implantation de H.264 sur MPSoC" Messaoudi, Kamel 19 December 2012 (has links) (PDF) Cette thèse est élaborée en cotutelle entre l'université Badji Mokhtar (Laboratoire LERICA) et l'université de bourgogne (Laboratoire LE2I, UMR CNRS 5158). Elle constitue une contribution à l'étude et l'implantation de l'encodeur H.264/AVC. Durent l'évolution des normes de compression vidéo, une réalité sure est vérifiée de plus en plus : avoir une bonne performance du processus de compression nécessite l'élaboration d'équipements beaucoup plus performants en termes de puissance de calcul, de flexibilité et de portabilité et ceci afin de répondre aux exigences des différents traitements et satisfaire au critère " Temps Réel ". Pour assurer un temps réel pour ce genre d'applications, une solution reste possible est l'utilisation des systèmes sur puce (SoC) ou bien des systèmes multiprocesseurs sur puce (MPSoC) implantés sur des plateformes reconfigurables à base de circuit FPGA. L'objective de cette thèse consiste à l'étude et l'implantation des algorithmes de traitement des signaux et images et en particulier la norme H.264/AVC, et cela dans le but d'assurer un temps réel pour le cycle codage-décodage. Nous utilisons deux plateformes FPGA de Xilinx (ML501 et XUPV5). Dans la littérature, il existe déjà plusieurs implémentations du décodeur. Pour l'encodeur, malgré les efforts énormes réalisés, il reste toujours du travail pour l'optimisation des algorithmes et l'extraction des parallélismes possibles surtout avec une variété de profils et de niveaux de la norme H.264/AVC.Dans un premier temps de cette thèse, nous proposons une implantation matérielle d'un contrôleur mémoire spécialement pour l'encodeur H.264/AVC. Ce contrôleur est réalisé en ajoutant, au contrôleur mémoire DDR2 des deux plateformes de Xilinx, une couche intelligente capable de calculer les adresses et récupérer les données nécessaires pour les différents modules de traitement de l'encodeur. Ensuite, nous proposons des implantations matérielles (niveau RTL) des modules de traitement de l'encodeur H.264. Sur ces implantations, nous allons exploiter les deux principes de parallélisme et de pipelining autorisé par l'encodeur en vue de la grande dépendance inter-blocs. Nous avons ainsi proposé plusieurs améliorations et nouvelles techniques dans les modules de la chaine Intra et le filtre anti-blocs. A la fin de cette thèse, nous utilisons les modules réalisés en matériels pour la l'implantation Matérielle/logicielle de l'encodeur H.264/AVC. Des résultats de synthèse et de simulation, en utilisant les deux plateformes de Xilinx, sont montrés et comparés avec les autres implémentations existantes [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre [SPI:OTHER] Engineering Sciences/Other Implantation matérielle Codesign L'encodeur H.264/AVC Temps réel Gestion de mémoire Contrôleur mémoire Parallélisme Pipelining SoC MPSoC FPGA Plateforme de prototypage Xilinx ML501 XUPV5
203	Motion Based Event Analysis Biswas, Sovan January 2014 (has links) (PDF) Motion is an important cue in videos that captures the dynamics of moving objects. It helps in eﬀective analysis of various event related tasks such as human action recognition, anomaly detection, tracking, crowd behavior analysis, traffic monitoring, etc. Generally, accurate motion information is computed using various optical flow estimation techniques. On the other hand, coarse motion information is readily available in the form of motion vectors in compressed videos. Utilizing these encoded motion vectors reduces the computational burden involved in flow estimation and enables rapid analysis of video streams. In this work, the focus is on analyzing motion patterns, retrieved from either motion vectors or optical flow, in order to do various event analysis tasks such as video classification, anomaly detection and crowd flow segmentation. In the first section, we utilize the motion vectors from H.264 compressed videos, a compression standard widely used due to its high compression ratio, to address the following problems. i) Video classification: This work proposes an approach to classify videos based on human action by capturing spatio-temporal motion pattern of the actions using Histogram of Oriented Motion Vector (HOMV) ii) Crowd flow segmentation: In this work, we have addressed the problem of flow segmentation of the dominant motion patterns of the crowds. The proposed approach combines multi-scale super-pixel segmentation of the motion vectors to obtain the final flow segmentation. iii) Anomaly detection: This problem is addressed by local modeling of usual behavior by capturing features such as magnitude and orientation of each moving object. In all the above approaches, the focus was to reduce computations while retaining comparable accuracy to pixel domain processing. In second section, we propose two approaches for anomaly detection using optical flow. The first approach uses spatio-temporal low level motion features and detects anomalies based on the reconstruction error of the sparse representation of the candidate feature over a dictionary of usual behavior features. The main contribution is in enhancing each local dictionary by applying an appropriate transformation on dictionaries of the neighboring regions. The other algorithm aims to improve the accuracy of anomaly localization through short local trajectories of super pixels belonging to moving objects. These trajectories capture both spatial as well as temporal information effectively. In contrast to compressed domain analysis, these pixel level approaches focus on improving the accuracy of detection with reasonable detection speed. Video Classification Anomaly Detection Crowd Behavior Analysis Crowd Flow Segmentation Video Analysis Motion Vectors Human Action Recognition Motion Based Event Analysis Event Analysis Anomaly Detection Histogram Oriented Motion Vectors (HOMV) Crowd Flow Segmentation H.264 Compressed Videos Computer Science
204	Bitrate Reduction Techniques for Low-Complexity Surveillance Video Coding Gorur, Pushkar January 2016 (has links) (PDF) High resolution surveillance video cameras are invaluable resources for effective crime prevention and forensic investigations. However, increasing communication bandwidth requirements of high definition surveillance videos are severely limiting the number of cameras that can be deployed. Higher bitrate also increases operating expenses due to higher data communication and storage costs. Hence, it is essential to develop low complexity algorithms which reduce data rate of the compressed video stream without affecting the image fidelity. In this thesis, a computer vision aided H.264 surveillance video encoder and four associated algorithms are proposed to reduce the bitrate. The proposed techniques are (I) Speeded up foreground segmentation, (II) Skip decision, (III) Reference frame selection and (IV) Face Region-of-Interest (ROI) coding. In the first part of the thesis, a modification to the adaptive Gaussian Mixture Model (GMM) based foreground segmentation algorithm is proposed to reduce computational complexity. This is achieved by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we compute periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal. In the second part, we propose a skip decision technique that uses a spatial sampler to sample pixels. The sampled pixels are segmented using the speeded up GMM algorithm. The storage pattern of the GMM parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. In the third part, a reference frame selection algorithm is proposed to maximize the number of background Macroblocks (MB’s) (i.e. MB’s that contain background image content) in the Decoded Picture Buffer. This reduces the cost of coding uncovered background regions. Distortion over foreground pixels is measured to quantify the performance of skip decision and reference frame selection techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence. In the final part of the thesis, face and shadow region detection is combined with the skip decision algorithm to perform ROI coding for pedestrian surveillance videos. Since person identification requires high quality face images, MB’s containing face image content are encoded with a low Quantization Parameter setting (i.e. high quality). Other regions of the body in the image are considered as RORI (Regions of reduced interest) and are encoded at low quality. The shadow regions are marked as Skip. Techniques that use only facial features to detect faces (e.g. Viola Jones face detector) are not robust in real world scenarios. Hence, we propose to initially detect pedestrians using deformable part models. The face region is determined using the deformed part locations. Detected pedestrians are tracked using an optical flow based tracker combined with a Kalman filter. The tracker improves the accuracy and also avoids the need to run the object detector on already detected pedestrians. Shadow and skin detector scores are computed over super pixels. Bilattice based logic inference is used to combine multiple likelihood scores and classify the super pixels as ROI, RORI or RONI. The coding mode and QP values of the MB’s are determined using the super pixel labels. The proposed techniques provide a further reduction in bitrate of up to 50.2%. Bitrate Reduction Surveillance Video Coding Video Surveillance Gaussian Mixture Model (GMM) Pedestrian Surveillance Cameras Region of Interest (ROI) Video Coding Surveillance Video Cameras Video Coding Encoding Computational Complexity Reduction H.264 Surveillance Coding Gaussian Mixture Model Algorithm Electrical Communication Engineering
205	Error-robust coding and transformation of compressed hybered hybrid video streams for packet-switched wireless networks Halbach, Till January 2004 (has links) <p>This dissertation considers packet-switched wireless networks for transmission of variable-rate layered hybrid video streams. Target applications are video streaming and broadcasting services. The work can be divided into two main parts.</p><p>In the first part, a novel quality-scalable scheme based on coefficient refinement and encoder quality constraints is developed as a possible extension to the video coding standard H.264. After a technical introduction to the coding tools of H.264 with the main focus on error resilience features, various quality scalability schemes in previous research are reviewed. Based on this discussion, an encoder decoder framework is designed for an arbitrary number of quality layers, hereby also enabling region-of-interest coding. After that, the performance of the new system is exhaustively tested, showing that the bit rate increase typically encountered with scalable hybrid coding schemes is, for certain coding parameters, only small to moderate. The double- and triple-layer constellations of the framework are shown to perform superior to other systems.</p><p>The second part considers layered code streams as generated by the scheme of the first part. Various error propagation issues in hybrid streams are discussed, which leads to the definition of a decoder quality constraint and a segmentation of the code stream to transmit. A packetization scheme based on successive source rate consumption is drafted, followed by the formulation of the channel code rate optimization problem for an optimum assignment of available codes to the channel packets. Proper MSE-based error metrics are derived, incorporating the properties of the source signal, a terminate-on-error decoding strategy, error concealment, inter-packet dependencies, and the channel conditions. The Viterbi algorithm is presented as a low-complexity solution to the optimization problem, showing a great adaptivity of the joint source channel coding scheme to the channel conditions. An almost constant image qualiity is achieved, also in mismatch situations, while the overall channel code rate decreases only as little as necessary as the channel quality deteriorates. It is further shown that the variance of code distributions is only small, and that the codes are assigned irregularly to all channel packets.</p><p>A double-layer constellation of the framework clearly outperforms other schemes with a substantial margin. </p><p>Keywords — Digital lossy video compression, visual communication, variable bit rate (VBR), SNR scalability, layered image processing, quality layer, hybrid code stream, predictive coding, progressive bit stream, joint source channel coding, fidelity constraint, channel error robustness, resilience, concealment, packet-switched, mobile and wireless ATM, noisy transmission, packet loss, binary symmetric channel, streaming, broadcasting, satellite and radio links, H.264, MPEG-4 AVC, Viterbi, trellis, unequal error protection</p> Digital lossy video compression visual communication variable bit rate (VBR) SNR scalability layered image processing quality layer hybrid code stream predictive coding progressive bit stream joint source channel coding fidelity constraint channel error robustness resilience concealment packet-switched mobile and wireless ATM noisy transmission packet loss binary symmetric channel streaming broadcasting satellite and radio links H.264 MPEG-4 AVC Viterbi trellis unequal error protectionr
206	Error-robust coding and transformation of compressed hybered hybrid video streams for packet-switched wireless networks Halbach, Till January 2004 (has links) This dissertation considers packet-switched wireless networks for transmission of variable-rate layered hybrid video streams. Target applications are video streaming and broadcasting services. The work can be divided into two main parts. In the first part, a novel quality-scalable scheme based on coefficient refinement and encoder quality constraints is developed as a possible extension to the video coding standard H.264. After a technical introduction to the coding tools of H.264 with the main focus on error resilience features, various quality scalability schemes in previous research are reviewed. Based on this discussion, an encoder decoder framework is designed for an arbitrary number of quality layers, hereby also enabling region-of-interest coding. After that, the performance of the new system is exhaustively tested, showing that the bit rate increase typically encountered with scalable hybrid coding schemes is, for certain coding parameters, only small to moderate. The double- and triple-layer constellations of the framework are shown to perform superior to other systems. The second part considers layered code streams as generated by the scheme of the first part. Various error propagation issues in hybrid streams are discussed, which leads to the definition of a decoder quality constraint and a segmentation of the code stream to transmit. A packetization scheme based on successive source rate consumption is drafted, followed by the formulation of the channel code rate optimization problem for an optimum assignment of available codes to the channel packets. Proper MSE-based error metrics are derived, incorporating the properties of the source signal, a terminate-on-error decoding strategy, error concealment, inter-packet dependencies, and the channel conditions. The Viterbi algorithm is presented as a low-complexity solution to the optimization problem, showing a great adaptivity of the joint source channel coding scheme to the channel conditions. An almost constant image qualiity is achieved, also in mismatch situations, while the overall channel code rate decreases only as little as necessary as the channel quality deteriorates. It is further shown that the variance of code distributions is only small, and that the codes are assigned irregularly to all channel packets. A double-layer constellation of the framework clearly outperforms other schemes with a substantial margin. Keywords — Digital lossy video compression, visual communication, variable bit rate (VBR), SNR scalability, layered image processing, quality layer, hybrid code stream, predictive coding, progressive bit stream, joint source channel coding, fidelity constraint, channel error robustness, resilience, concealment, packet-switched, mobile and wireless ATM, noisy transmission, packet loss, binary symmetric channel, streaming, broadcasting, satellite and radio links, H.264, MPEG-4 AVC, Viterbi, trellis, unequal error protection Digital lossy video compression visual communication variable bit rate (VBR) SNR scalability layered image processing quality layer hybrid code stream predictive coding progressive bit stream joint source channel coding fidelity constraint channel error robustness resilience concealment packet-switched mobile and wireless ATM noisy transmission packet loss binary symmetric channel streaming broadcasting satellite and radio links H.264 MPEG-4 AVC Viterbi trellis unequal error protectionr
207	Codage vidéo hybride basé contenu par analyse/synthèse de données Moinard, Matthieu 01 July 2011 (has links) (PDF) Les travaux de cette thèse sont destinés à la conception d'outils algorithmiques permettant d'accroître le facteur de compression des standards actuels de codage vidéo, tels que H.264/AVC. Pour cela, une étude préalable portant sur un ensemble de méthodes de restauration d'image a permis d'identifier et d'inspecter deux axes de recherche distincts. La première partie est fondée sur des méthodes d'analyse et de synthèse de texture. Ce type de procédé, aussi connu sous le nom de template matching, est couramment utilisé dans un contexte de codage vidéo pour prédire une portion de la texture de l'image suite à l'analyse de son voisinage. Nous avons cherché à améliorer le modèle de prédiction en prenant en compte les spécificités d'un codeur vidéo de type H.264/AVC. En particulier, la fonction débit/distorsion utilisée dans les schémas de codage vidéo normatifs se base sur une mesure objective de la qualité. Ce mécanisme est par nature incompatible avec le concept de synthèse de texture, dont l'efficacité est habituellement mesurée selon des critères purement perceptuels. Cette contradiction a motivé le travail de notre première contribution. La deuxième partie des travaux de cette thèse s'inspire des méthodes de régularisation d'image basée sur la minimisation de la variation totale. Des méthodes ont été élaborées originellement dans le but d'améliorer la qualité d'une image en fonction de la connaissance a priori des dégradations qu'elle a subies. Nous nous sommes basés sur ces travaux pour concevoir un modèle de prédiction des coefficients transformés obtenus à partir d'une image naturelle, qui a été intégré dans un schéma de codage vidéo conventionnel. codage vidéo MPEG-4 H.264/AVC prédiction intra-image synthèse d'image template matching régularisation d'image prédiction de coefficients transformés
208	Compression des données Multi-View-plus-Depth (MVD): de l'analyse de la qualité perçue à l'élaboration d'outils pour le codage des données MVD Bosc, Emilie 22 October 2012 (has links) (PDF) Cette thèse aborde la problématique de compression des vidéos multi-vues avec pour pilier un souci constant du respect de la perception humaine du media, dans le contexte de la vidéo 3D. Les études et les choix portés durant cette thèse se veulent orientés par la recherche de la meilleure qualité perçue possible des vues synthétisées. L'enjeu des travaux que de cette thèse réside dans l'investigation de nouvelles techniques de compression des données multi-view-plus-depth (MVD) limitant autant que possible les dégradations perceptibles sur les vues synthétisées à partir de ces données décodées. La difficulté vient du fait que les sources de dégradations des vues synthétisées sont d'une part multiples et d'autre part difficilement mesurables par les techniques actuelles d'évaluation de qualité d'images. Pour cette raison, les travaux de cette thèse s'articulent autour de deux axes principaux: l'évaluation de la qualité des vues synthétisées ainsi que les artefacts spécifiques et l'étude de schémas de compression des données MVD aidée de critères perceptuels. Durant cette thèse nous avons réalisé des études pour caractériser les artefacts liés aux algorithmes DIBR. Les analyses des tests de Student réalisés à partir des scores des tests de Comparaisons par paires et ACR-HR ont permis de déterminer l'adéquation des méthodes d'évaluation subjective de qualité pour le cas des vues synthétisées. L'évaluation des métriques objectives de qualité d'image/vidéo ont également permis d'établir leur corrélation avec les scores subjectifs. Nous nous sommes ensuite concentrés sur la compression des cartes de profondeur, en proposant deux méthodes dérivées pour le codage des cartes de profondeur et basées sur la méthode LAR. En nous appuyant sur nos observations, nous avons proposé une stratégie de représentation et de codage adaptée au besoin de préserver les discontinuités de la carte tout en réalisant des taux de compression importants. Les comparaisons avec les codecs de l'état de l'art (H.264/AVC, HEVC) montrent que notre méthode propose des images de meilleure qualité visuelle à bas débit. Nous avons également réalisé des études sur la répartition du débit entre la texture et la profondeur lors de la compression de séquences MVD. Les résultats de cette thèse peuvent être utilisés pour aider à la conception de nouveaux protocoles d'évaluation de qualité de données de synthèse; pour la conception de nouvelles métriques de qualité; pour améliorer les schémas de codage pour les données MVD, notamment grâce aux approches originales proposées; pour optimiser les schémas de codage de données MVD, à partir de nos études sur les relations entre la texture et la profondeur. Vidéo 3D évaluation de qualité compression multi-view MVD ACR-HR PC H.264 HEVC LAR 3DTV FTV carte de profondeur
209	Multimedia Forensics Using Metadata Ziyue Xiang (17989381) 21 February 2024 (has links) <p dir="ltr">The rapid development of machine learning techniques makes it possible to manipulate or synthesize video and audio information while introducing nearly indetectable artifacts. Most media forensics methods analyze the high-level data (e.g., pixels from videos, temporal signals from audios) decoded from compressed media data. Since media manipulation or synthesis methods usually aim to improve the quality of such high-level data directly, acquiring forensic evidence from these data has become increasingly challenging. In this work, we focus on media forensics techniques using the metadata in media formats, which includes container metadata and coding parameters in the encoded bitstream. Since many media manipulation and synthesis methods do not attempt to hide metadata traces, it is possible to use them for forensics tasks. First, we present a video forensics technique using metadata embedded in MP4/MOV video containers. Our proposed method achieved high performance in video manipulation detection, source device attribution, social media attribution, and manipulation tool identification on publicly available datasets. Second, we present a transformer neural network based MP3 audio forensics technique using low-level codec information. Our proposed method can localize multiple compressed segments in MP3 files. The localization accuracy of our proposed method is higher compared to other methods. Third, we present an H.264-based video device matching method. This method can determine if the two video sequences are captured by the same device even if the method has never encountered the device. Our proposed method achieved good performance in a three-fold cross validation scheme on a publicly available video forensics dataset containing 35 devices. Fourth, we present a Graph Neural Network (GNN) based approach for the analysis of MP4/MOV metadata trees. The proposed method is trained using Self-Supervised Learning (SSL), which increased the robustness of the proposed method and makes it capable of handling missing/unseen data. Fifth, we present an efficient approach to compute the spectrogram feature with MP3 compressed audio signals. The proposed approach decreases the complexity of speech feature computation by ~77.6% and saves ~37.87% of MP3 decoding time. The resulting spectrogram features lead to higher synthetic speech detection performance.</p> Audio processing Computer vision Image and video coding Image processing Pattern recognition Video processing Digital forensics Deep learning Deepfake detection Digital forensics Video forensics Audio forensics Video metadata Audio metadata H.264 MP3 MP4 Video manipulation detection Video compression Audio compression Decision tree Deep learning Dimensionality reduction Spectrogram Graph neural networks Neural networks Transformer neural networks

Search results