Spelling suggestions: "subject:"video processing"" "subject:"ideo processing""
71 |
Sequential Semantic Segmentation of Streaming Scenes for Autonomous DrivingGuo Cheng (13892388) 03 February 2023 (has links)
<p>In traffic scene perception for autonomous vehicles, driving videos are available from in-car sensors such as camera and LiDAR for road detection and collision avoidance. There are some existing challenges in computer vision tasks for video processing, including object detection and tracking, semantic segmentation, etc. First, due to that consecutive video frames have a large data redundancy, traditional spatial-to-temporal approach inherently demands huge computational resource. Second, in many real-time scenarios, targets move continuously in the view as data streamed in. To achieve prompt response with minimum latency, an online model to process the streaming data in shift-mode is necessary. Third, in addition to shape-based recognition in spatial space, motion detection also replies on the inherent temporal continuity in videos. While current works either lack long-term memory for reference or consume a huge amount of computation. </p>
<p><br></p>
<p>The purpose of this work is to achieve strongly temporal-associated sensing results in real-time with minimum memory, which is continually embedded to a pragmatic framework for speed and path planning. It takes a temporal-to-spatial approach to cope with fast moving vehicles in autonomous navigation. It utilizes compact road profiles (RP) and motion profiles (MP) to identify path regions and dynamic objects, which drastically reduces video data to a lower dimension and increases sensing rate. Specifically, we sample one-pixel line at each video frame, the temporal congregation of lines from consecutive frames forms a road profile image; while motion profile consists of the average lines by sampling one-belt pixels at each frame. By applying the dense temporal resolution to compensate the sparse spatial resolution, this method reduces 3D streaming data into 2D image layout. Based on RP and MP under various weather conditions, there have three main tasks being conducted to contribute the knowledge domain in perception and planning for autonomous driving. </p>
<p><br></p>
<p>The first application is semantic segmentation of temporal-to-spatial streaming scenes, including recognition of road and roadside, driving events, objects in static or motion. Since the main vision sensing tasks for autonomous driving are identifying road area to follow and locating traffic to avoid collision, this work tackles this problem by using semantic segmentation upon road and motion profiles. Though one-pixel line may not contain sufficient spatial information of road and objects, the consecutive collection of lines as a temporal-spatial image provides intrinsic spatial layout because of the continuous observation and smooth vehicle motion. Moreover, by capturing the trajectory of pedestrians upon their moving legs in motion profile, we can robustly distinguish pedestrian in motion against smooth background. The experimental results of streaming data collected from various sensors including camera and LiDAR demonstrate that, in the reduced temporal-to-spatial space, an effective recognition of driving scene can be learned through Semantic Segmentation.</p>
<p><br></p>
<p>The second contribution of this work is that it accommodates standard semantic segmentation to sequential semantic segmentation network (SE3), which is implemented as a new benchmark for image and video segmentation. As most state-of-the-art methods are greedy for accuracy by designing complex structures at expense of memory use, which makes trained models heavily depend on GPUs and thus not applicable to real-time inference. Without accuracy loss, this work enables image segmentation at the minimum memory. Specifically, instead of predicting for image patch, SE3 generates output along with line scanning. By pinpointing the memory associated with the input line at each neural layer in the network, it preserves the same receptive field as patch size but saved the computation in the overlapped regions during network shifting. Generally, SE3 applies to most of the current backbone models in image segmentation, and furthers the inference by fusing temporal information without increasing computation complexity for video semantic segmentation. Thus, it achieves 3D association over long-range while under the computation of 2D setting. This will facilitate inference of semantic segmentation on light-weighted devices.</p>
<p><br></p>
<p>The third application is speed and path planning based on the sensing results from naturalistic driving videos. To avoid collision in a close range and navigate a vehicle in middle and far ranges, several RP/MPs are scanned continuously from different depths for vehicle path planning. The semantic segmentation of RP/MP is further extended to multi-depths for path and speed planning according to the sensed headway and lane position. We conduct experiments on profiles of different sensing depths and build up a smoothly planning framework according to their them. We also build an initial dataset of road and motion profiles with semantic labels from long HD driving videos. The dataset is published as additional contribution to the future work in computer vision and autonomous driving. </p>
|
72 |
A Real-Time Computational Decision Support System for Compounded Sterile Preparations using Image Processing and Artificial Neural NetworksRegmi, Hem Kanta January 2016 (has links)
No description available.
|
73 |
Semantic content analysis for effective video segmentation, summarisation and retrievalRen, Jinchang January 2009 (has links)
This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.
|
74 |
Reconnaissance des actions humaines à partir d'une séquence vidéoTouati, Redha 12 1900 (has links)
The work done in this master's thesis, presents a new system for the
recognition of human actions from a video sequence. The system uses,
as input, a video sequence taken by a static camera. A binary
segmentation method of the the video sequence is first achieved, by a
learning algorithm, in order to detect and extract the different people
from the background. To recognize an action, the system then exploits
a set of prototypes generated from an MDS-based dimensionality
reduction technique, from two different points of view in the video
sequence. This dimensionality reduction technique, according to two
different viewpoints, allows us to model each human action of the
training base with a set of prototypes (supposed to be similar for
each class) represented in a low dimensional non-linear space. The
prototypes, extracted according to the two viewpoints, are fed to a
$K$-NN classifier which allows us to identify the human action that
takes place in the video sequence. The experiments of our model
conducted on the Weizmann dataset of human actions provide interesting
results compared to the other state-of-the art (and often more
complicated) methods. These experiments show first the
sensitivity of our model for each viewpoint and its effectiveness to
recognize the different actions, with a variable but satisfactory
recognition rate and also the results obtained by the fusion of these
two points of view, which allows us to achieve a high performance
recognition rate. / Le travail mené dans le cadre de ce projet de maîtrise vise à
présenter un nouveau système de reconnaissance d’actions humaines à
partir d'une séquence d'images vidéo. Le système utilise en entrée une
séquence vidéo prise par une caméra statique. Une méthode de
segmentation binaire est d'abord effectuée, grâce à un algorithme
d’apprentissage, afin de détecter les différentes personnes de
l'arrière-plan. Afin de reconnaitre une action, le système exploite
ensuite un ensemble de prototypes générés, par une technique de
réduction de dimensionnalité MDS, à partir de deux points de vue
différents dans la séquence d'images. Cette étape de réduction de
dimensionnalité, selon deux points de vue différents, permet de
modéliser chaque action de la base d'apprentissage par un ensemble de
prototypes (censé être relativement similaire pour chaque classe)
représentés dans un espace de faible dimension non linéaire. Les
prototypes extraits selon les deux points de vue sont amenés à un
classifieur K-ppv qui permet de reconnaitre l'action qui se déroule
dans la séquence vidéo. Les expérimentations de ce système sur la
base d’actions humaines de Wiezmann procurent des résultats assez
intéressants comparés à d’autres méthodes plus complexes. Ces
expériences montrent d'une part, la sensibilité du système pour chaque
point de vue et son efficacité à reconnaitre les différentes actions,
avec un taux de reconnaissance variable mais satisfaisant, ainsi que
les résultats obtenus par la fusion de ces deux points de vue, qui
permet l'obtention de taux de reconnaissance très performant.
|
75 |
Human computer interface based on hand gesture recognitionBernard, Arnaud Jean Marc 24 August 2010 (has links)
With the improvement of multimedia technologies such as broadband-enabled HDTV, video on demand and internet TV, the computer and the TV are merging to become a single device. Moreover the previously cited technologies as well as DVD or Blu-ray can provide menu navigation and interactive content.
The growing interest in video conferencing led to the integration of the webcam in different devices such as laptop, cell phones and even the TV set. Our approach is to directly use an embedded webcam to remotely control a TV set using hand gestures. Using specific gestures, a user is able to control the TV. A dedicated interface can then be used to select a TV channel, adjust volume or browse videos from an online streaming server.
This approach leads to several challenges. The first is the use of a simple webcam which leads to a vision based system. From the single webcam, we need to recognize the hand and identify its gesture or trajectory. A TV set is usually installed in a living room which implies constraints such as a potentially moving background and luminance change. These issues will be further discussed as well as the methods developed to resolve them. Video browsing is one example of the use of gesture recognition. To illustrate another application, we developed a simple game controlled by hand gestures.
The emergence of 3D TVs is allowing the development of 3D video conferencing. Therefore we also consider the use of a stereo camera to recognize hand gesture.
|
76 |
Reconnaissance des actions humaines à partir d'une séquence vidéoTouati, Redha 12 1900 (has links)
The work done in this master's thesis, presents a new system for the
recognition of human actions from a video sequence. The system uses,
as input, a video sequence taken by a static camera. A binary
segmentation method of the the video sequence is first achieved, by a
learning algorithm, in order to detect and extract the different people
from the background. To recognize an action, the system then exploits
a set of prototypes generated from an MDS-based dimensionality
reduction technique, from two different points of view in the video
sequence. This dimensionality reduction technique, according to two
different viewpoints, allows us to model each human action of the
training base with a set of prototypes (supposed to be similar for
each class) represented in a low dimensional non-linear space. The
prototypes, extracted according to the two viewpoints, are fed to a
$K$-NN classifier which allows us to identify the human action that
takes place in the video sequence. The experiments of our model
conducted on the Weizmann dataset of human actions provide interesting
results compared to the other state-of-the art (and often more
complicated) methods. These experiments show first the
sensitivity of our model for each viewpoint and its effectiveness to
recognize the different actions, with a variable but satisfactory
recognition rate and also the results obtained by the fusion of these
two points of view, which allows us to achieve a high performance
recognition rate. / Le travail mené dans le cadre de ce projet de maîtrise vise à
présenter un nouveau système de reconnaissance d’actions humaines à
partir d'une séquence d'images vidéo. Le système utilise en entrée une
séquence vidéo prise par une caméra statique. Une méthode de
segmentation binaire est d'abord effectuée, grâce à un algorithme
d’apprentissage, afin de détecter les différentes personnes de
l'arrière-plan. Afin de reconnaitre une action, le système exploite
ensuite un ensemble de prototypes générés, par une technique de
réduction de dimensionnalité MDS, à partir de deux points de vue
différents dans la séquence d'images. Cette étape de réduction de
dimensionnalité, selon deux points de vue différents, permet de
modéliser chaque action de la base d'apprentissage par un ensemble de
prototypes (censé être relativement similaire pour chaque classe)
représentés dans un espace de faible dimension non linéaire. Les
prototypes extraits selon les deux points de vue sont amenés à un
classifieur K-ppv qui permet de reconnaitre l'action qui se déroule
dans la séquence vidéo. Les expérimentations de ce système sur la
base d’actions humaines de Wiezmann procurent des résultats assez
intéressants comparés à d’autres méthodes plus complexes. Ces
expériences montrent d'une part, la sensibilité du système pour chaque
point de vue et son efficacité à reconnaitre les différentes actions,
avec un taux de reconnaissance variable mais satisfaisant, ainsi que
les résultats obtenus par la fusion de ces deux points de vue, qui
permet l'obtention de taux de reconnaissance très performant.
|
77 |
Video event detection and visual data pro cessing for multimedia applicationsSzolgay, Daniel 30 September 2011 (has links)
Cette thèse (i) décrit une procédure automatique pour estimer la condition d'arrêt des méthodes de déconvolution itératives basées sur un critère d'orthogonalité du signal estimé et de son gradient à une itération donnée; (ii) présente une méthode qui décompose l'image en une partie géométrique (ou "cartoon") et une partie "texture" en utilisation une estimation de paramètre et une condition d'arrêt basées sur la diffusion anisotropique avec orthogonalité, en utilisant le fait que ces deux composantes. "cartoon" et "texture", doivent être indépendantes; (iii) décrit une méthode pour extraire d'une séquence vidéo obtenue à partir de caméra portable les objets de premier plan en mouvement. Cette méthode augmente la compensation de mouvement de la caméra par une nouvelle estimation basée noyau de la fonction de probabilité de densité des pixels d'arrière-plan. Les méthodes présentées ont été testées et comparées aux algorithmes de l'état de l'art. / This dissertation (i) describes an automatic procedure for estimating the stopping condition of non-regularized iterative deconvolution methods based on an orthogonality criterion of the estimated signal and its gradient at a given iteration; (ii) presents a decomposition method that splits the image into geometric (or cartoon) and texture parts using anisotropic diffusion with orthogonality based parameter estimation and stopping condition, utilizing the theory that the cartoon and the texture components of an image should be independent of each other; (iii) describes a method for moving foreground object extraction in sequences taken by wearable camera, with strong motion, where the camera motion compensated frame differencing is enhanced with a novel kernel-based estimation of the probability density function of the background pixels. The presented methods have been thoroughly tested and compared to other similar algorithms from the state-of-the-art.
|
78 |
Počítačové vidění a detekce gest rukou a prstů / Computer vision and hand gestures detection and fingers trackingBravenec, Tomáš January 2019 (has links)
Diplomová práce je zaměřena na detekci a rozpoznání gest rukou a prstů ve statických obrazech i video sekvencích. Práce obsahuje shrnutí několika různých přístupů k samotné detekci a také jejich výhody i nevýhody. V práci je též obsažena realizace multiplatformní aplikace napsané v Pythonu s použitím knihoven OpenCV a PyTorch, která dokáže zobrazit vybraný obraz nebo přehrát video se zvýrazněním rozpoznaných gest.
|
79 |
Nerealistické zobrazení videa / Non-Realistic Video RenderingJohannesová, Daniela January 2011 (has links)
The aim of this thesis is non-realistic video rendering. It starts with a summary of existing techniques and then this thesis concentrates on selected methods that are able to work in real-time. To process video more effectivelly, we use accelleration on graphical processing unit with usage of OpenGL and GLSL.
|
80 |
An Experimental Investigation of Water Droplet Growth, Deformation Dynamics and Detachment in a Non-Reacting PEM Fuel Cell via Fluorescence PhotometryMontello, Aaron David 08 December 2008 (has links)
No description available.
|
Page generated in 0.0686 seconds