• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 3
  • 1
  • 1
  • Tagged with
  • 21
  • 21
  • 10
  • 9
  • 5
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Pavement surface distress evaluation using video image analysis

Acosta, Jesus-Adolfo January 1994 (has links)
No description available.
12

Video Categorization Using Semantics and Semiotics

Rasheed, Zeeshan 01 January 2003 (has links) (PDF)
There is a great need to automatically segment, categorize, and annotate video data, and to develop efficient tools for browsing and searching. We believe that the categorization of videos can be achieved by exploring the concepts and meanings of the videos. This task requires bridging the gap between low-level content and high-level concepts (or semantics). Once a relationship is established between the low-level computable features of the video and its semantics, the user would be able to navigate through videos through the use of concepts and ideas (for example, a user could extract only those scenes in an action film that actually contain fights) rat her than sequentially browsing the whole video. However, this relationship must follow the norms of human perception and abide by the rules that are most often followed by the creators (directors) of these videos. These rules are called film grammar in video production literature. Like any natural language, this grammar has several dialects, but it has been acknowledged to be universal. Therefore, the knowledge of film grammar can be exploited effectively for the understanding of films. To interpret an idea using the grammar, we need to first understand the symbols, as in natural languages, and second, understand the rules of combination of these symbols to represent concepts. In order to develop algorithms that exploit this film grammar, it is necessary to relate the symbols of the grammar to computable video features. In this dissertation, we have identified a set of computable features of videos and have developed methods to estimate them. A computable feature of audio-visual data is defined as any statistic of available data that can be automatically extracted using image/signal processing and computer vision techniques. These features are global in nature and are extracted using whole images, therefore, they do not require any object detection, tracking and classification. These features include video shots, shot length, shot motion content, color distribution, key-lighting, and audio energy. We use these features and exploit the knowledge of ubiquitous film grammar to solve three related problems: segmentation and categorization of talk and game shows; classification of movie genres based on the previews; and segmentation and representation of full-length Hollywood movies and sitcoms. We have developed a method for organizing videos of talk and game shows by automatically separating the program segments from the commercials and then classifying each shot as the host's or guest's shot. In our approach, we rely primarily on information contained in shot transitions and utilize the inherent difference in the scene structure (grammar) of commercials and talk shows. A data structure called a shot connectivity graph is constructed, which links shots over time using temporal proximity and color similarity constraints. Analysis of the shot connectivity graph helps us to separate commercials from program segments. This is done by first detecting stories, and then assigning a weight to each story based on its likelihood of being a commercial or a program segment. We further analyze stories to distinguish shots of the hosts from those of the guests. We have performed extensive experiments on eight full-length talk shows (e.g. Larry King Live, Meet the Press, News Night) and game shows (Who Wants To Be A Millionaire), and have obtained excellent classification with 96% recall and 99% precision. http://www.cs.ucf.edu/~vision/projects/LarryKing/LarryKing.html Secondly, we have developed a novel method for genre classification of films using film previews. In our approach, we classify previews into four broad categories: comedies, action, dramas or horror films. Computable video features are combined in a framework with cinematic principles to provide a mapping to these four high-level semantic classes. We have developed two methods for genre classification; (a) a hierarchical method and (b) an unsupervised classification met hod. In the hierarchical method, we first classify movies into action and non-action categories based on the average shot length and motion content in the previews. Next, non-action movies are sub-classified into comedy, horror or drama categories by examining their lighting key. Finally, action movies are ranked on the basis of number of explosions/gunfire events. In the unsupervised method for classifying movies, a mean shift classifier is used to discover the structure of the mapping between the computable features and each film genre. We have conducted extensive experiments on over a hundred film previews and demonstrated that low-level features can be efficiently utilized for movie classification. We achieved about 87% successful classification. http://www.cs.ucf.edu/-vision/projects/movieClassification/movieClmsification.html Finally, we have addressed the problem of detecting scene boundaries in full-length feature movies. We have developed two novel approaches to automatically find scenes in the videos. Our first approach is a two-pass algorithm. In the first pass, shots are clustered by computing backward shot coherence; a shot color similarity measure that detects potential scene boundaries (PSBs) in the videos. In the second pass we compute scene dynamics for each scene as a function of shot length and the motion content in the potential scenes. In this pass, a scene-merging criterion is used to remove weak PSBs in order to reduce over-segmentation. In our second approach, we cluster shots into scenes by transforming this task into a graph-partitioning problem. This is achieved by constructing a weighted undirected graph called a shot similarity graph (SSG), where each node represents a shot and the edges between the shots are weighted by their similarities (color and motion). The SSG is then split into sub-graphs by applying the normalized cut technique for graph partitioning. The partitions obtained represent individual scenes in the video. We further extend the framework to automatically detect the best representative key frames of identified scenes. With this approach, we are able to obtain a compact representation of huge videos in a small number of key frames. We have performed experiments on five Hollywood films (Terminator II, Top Gun, Gone In 60 Seconds, Golden Eye, and A Beautiful Mind) and one TV sitcom (Seinfeld) that demonstrate the effectiveness of our approach. We achieved about 80% recall and 63% precision in our experiments. http://www.cs.ucf.edu/~vision/projects/sceneSeg/sceneSeg.html
13

Use of quantitative trait loci (QTL) affecting muscling in sheep for breeding

Masri, Amer January 2013 (has links)
Breeding programmes that use elite sires with the best estimated breeding values for muscling traits have achieved significant improvement in lamb production in the UK. Further acceleration of the rate of genetic gain for the desirable production traits could be achieved using DNA marker-assisted selection (MAS) breeding strategies. The underlying causal genetic variants associated with improved muscling may be unknown and lying between a cluster of genes known as quantitative trait loci (QTL) or could be single nucleotide polymorphisms (SNP). LoinMAXTM, Texel muscling QTL (TM-QTL) and c.*1232G > A myostatin mutation were genetic variants that reported to be associated with improved muscling characteristics and hence subjected to further analysis in this project. It is essential before incorporating segregating genetic variants in any breeding scheme to comprehensively evaluate their effects on carcass traits. In-vivo scanning (ultrasound scanning (US) and computed tomography scanning (CT)), and carcass video image analyses (VIA) were used in the current studies. Objective VIAprediction weights of the carcass primal cuts could be the backbone of a value-based marketing system that is suggested to replace the current Meat and Livestock Commission (MLC) carcass grades for conformation scores (MLC-C) and fat class (MLC-F). The effect of a single copy of LoinMAXTM QTL (LM-QTL) compared to noncarriers was evaluated in UK crossbred lambs out of Scottish Mule ewes. M. longissimus lumborum (MLL) width, depth and area, as measured by CT scanning, were significantly greater in lambs heterozygous for LM-QTL compared to noncarriers. VIA detected a significant effect of the LM-QTL on the predicted weight of saleable meat yield in the loin primal cut (+2.2%; P < 0.05). The effects of the ovine c.*1232G > A myostatin mutation (MM), found on sheep chromosome 2, on carcass traits in heterozygous crossbred lambs sired by Texel and Poll Dorset rams were studied. Texel crossbred lambs carrying MM had increased loin depth and area. In both crossbred lambs, MM-carriers had significantly higher CT-estimated lean weight and proportion (2 to 4%) and muscle to bone ratios (by ~3%). Poll Dorset heterozygous crossbred animals had higher muscle to fat ratio (28%) and significantly lower fat-related measurements. The c.*1232G > A (MM) mutation as well as TM-QTL effects were evaluated in a different genetic background of Texel x Welsh Mountain crossbreed lambs. Carrying two copies of MM was associated with a significant positive effect on 8 week weight, a negative effect on ultrasound fat depth, a substantial decrease in MLC-fat score, positive impact on VIA-estimated weight of the hind leg, chump and loin primal cuts, as well as the muscularity of the hind leg and loin regions with greater loin muscle width, depth and area. Two copies of MM altered lambs‟ morphological traits with significantly wider carcasses across the shoulders, breast and hind legs and greater areas of the back view of the carcass when measured by VIA. TM-QTL significantly increased US-muscle depth and TM-QTL carriers had significantly greater loin muscle width and area measurements. Comparing TM-QTL genetic groups (homozygote allele carriers (TM/TM), heterozygote carriers of paternal and maternal origin of allele (TM/+ and +/TM, respectively) and homozygote non-carriers (+/+)) and TM-QTL mode of action were then studied. TM/TM carcasses were significantly heavier than non-carriers by 1.6 kg and scored higher conformation values when compared to heterozygote groups only. TM/+ lambs had significantly higher VIA-predicted weight and muscularity in the hind leg and loin, and higher loin dimensions relative to some other genotypic groups. The effect of TM-QTL on some carcass shape measurements was significant. TM-QTL mode of action results on the loin muscling traits supports the earlier reports of polar over dominance. In the light of growing calls to replace the current subjective carcass payment system with the objective VIA system that values the carcass according to the superiority of its cuts, I investigated the ability of US and CT measurements to predict the VIAestimated weights of the carcass primal cuts. Several prediction equations were examined but the best could be achieved when ultrasound measurement, CT linear measurements and live weight were fitted in the model. Since CT scanning information of elite sires is now being used for genetic selection for carcass merit, genetic parameters and genetic relationships between CT scanning measurements and post mortem traits (VIA and MLC-FC) were estimated. However, results were not sufficiently accurate to be of practical use due to lack of data.
14

Statistical semantic analysis of spatio-temporal image sequences /

Luo, Ying, January 2004 (has links)
Thesis (Ph. D.)--University of Washington, 2004. / Vita. Includes bibliographical references (p. 99-105).
15

Comparison of GPS-Equipped Vehicles and Its Archived Data for the Estimation of Freeway Speeds

Lee, Jaesup 09 April 2007 (has links)
Video image detection system (VDS) equipment provides real-time traffic data for monitored highways directly to the traffic management center (TMC) of the Georgia Department of Transportation. However, at any given time, approximately 30 to 35% of the 1,600 camera stations (STNs) fail to work properly. The main reasons for malfunctions in the VDS system include long term road construction activity and operational limitations. Thus, providing alternative data sources for offline VDS stations and developing tools that can help detect problems with VDS stations can facilitate the successful operation of the TMC. To estimate the travel speed of non-working STNs, this research examined global positioning system (GPS) data from vehicles using the ATMS-monitored freeway system as a potential alternative measure to VDS. The goal of this study is to compare VDS speed data for the estimation of the travel speed on freeways with GPS-equipped vehicle trip data, and to assess the differences between these measurements as a potential function of traffic and roadway conditions, environmental, conditions, and driver/vehicle characteristics. The difference between GPS and VDS speeds is affected by various factors such as congestion level (expressed as level of service), onroad truck percentage, facility design (number of lanes and freeway sub-type), posted speed limit, weather, daylight, and time of day. The relationship between monitored speed difference and congestion level was particularly large and was observed to interact with most other factors. Classification and regression tree (CART) analysis results indicated that driver age was the most relevant variable in explaining variation for the southbound of freeway dataset and freeway sub-type, speed limit, driver age, and number of lane were the most influential variables for the northbound of freeway dataset. The combination of several variables had significant contribution in the reduction of the deviation for both the northbound and the southbound dataset. Although this study identifies potential relationships between speed difference and various factors, the results of the CART analysis should be considered with the driver sample size to yield statistically significant results. Expanded sampling with larger number of drivers would enrich this study results.
16

Non-Destructive Evaluation and Mathematical Modeling of Beef Loins Subjected to High Hydrodynamic Pressure Treatment

Lakshmikanth, Anand 15 September 2009 (has links)
High hydrodynamic pressure (HDP) treatment is a novel non-thermal technology that improves tenderness in foods by subjecting foods to underwater shock waves. In this study non-destructive and destructive testing methods, along with two mathematical models were explored to predict biomechanical behavior of beef loins subjected to HDP-treament. The first study involved utilizing ultrasound and imaging techniques to predict textural changes in beef loins subjected to HDP-treatment using Warner-Braztler shear force (WBS) scores and texture profile analysis (TPA) features for correlation. Ultrasound velocity correlated very poorly with the WBS scores and TPA features, whereas the imaging features correlated better with higher r-values. The effect of HDP-treatment variables on WBS and TPA features indicated that amount of charge had no significant effects when compared to location of sample and container size during treatment. Two mathematical models were used to simulate deformational behavior in beef loins. The first study used a rheological based modeling of protein gel as a preliminary study. Results from the first modeling study indicated no viscous interactions in the model and complete deformation failure at pressures exceeding 50 kPa, which was contrary to the real-life process conditions which use pressures in the order of MPa. The second modeling study used a finite element method approach to model elastic behavior. Shock wave was modeled as a non-linear and linear propagating wave. The non-linear model indicated no deformation response, whereas the linear model indicated realistic deformation response assuming transverse isotropy of the model beef loin. The last study correlated small- and large-strain measurements using stress relaxation and elastic coefficients of the stiffness matrix as small-strain measures and results of the study indicated very high correlation between elastic coefficients c11, c22, and c44 with TPA cohesiveness (r > 0.9), and springiness (r > 0.85). Overall results of this study indicated a need for further research in estimating mechanical properties of beef loins in order to understand the dynamics of HDP-treatment process better. / Ph. D.
17

Fleischleistung und Fleischqualität bei Weidenkälbern unter Berücksichtigung des mit Videobildanalyse bestimmten Fettanteils im M. longissimus dorsi

Sanaa, Djamel 14 February 2010 (has links)
Um Leistungsniveau, systematische Effekte und phänotypische Korrelationen zu schätzen, wurden bei 276 Weidekälbern verschiedener Genotypen aus der Mutterkuhhaltung (Material I) Merkmale der Fleischleistung und der Fleisch¬qualität erfasst und ausgewertet. Neben den Identitäts- und Lebensdaten der Tiere wurden die folgenden Merkmale der Fleischleistung einbezogen: Alter beim Schlachten, Lebendgewicht, Lebenstagszunahme, Schlachtkörper¬gewicht, Nettozunahme sowie die Komponenten der Handelsklasse, also Fleischigkeitsklasse und Fettgewebeklasse. / In order to estimate performance level, systematic effects and phenotypic correlations, characteristics of meat performance and meat quality were recorded and analysed in 276 weaning calves of different genotypes from cow-calf operations (material I). In addition to identity and basic life data, the following characteristics of meat performance were included: age at slaughter, live weight, daily gain, carcass weight, net gain as well as the components of the carcass grade, grade for meatiness and grade for fat¬ness. The following characteristics of meat quality were measured on the meat probes 48 hours and 14 days resp. after slaughter: pH-value (pH1, pH2), meat colour with lightness L* (L*1, L*2) redness a* (a*1, a*2) and yellowness b* (b*1, b*2), Warner-Bratzler shearforce raw (WBS1, WBS2) and cooked (WBS3) as well as intramuscular fat content (IMF).
18

L'impact des tempêtes sur les plages de poche aménagée / Storm impact on engineered pocket beaches

De Santiago Gonzalez, InakiCamus 18 December 2014 (has links)
Ce travail de thèse porte sur l'étude du comportement morphodynamique d'une plage de poche, partiellement aménagée, lors des événements de tempête. La plage de Zarautz (Espagne) a été choisie comme site d'étude en raison de son climat de vagues et de sa configuration. La plage est limitée latéralement par des falaises rocheuses. Elle présente un système dunaire sur la partie est et une digue aménagée en promenade sur le reste de la plage. Le climat de houle au large de Zarautz (bouée de Bilbao) est caractérisé par une faible variabilité directionnelle. Dans 95 % des cas, les vagues proviennent de directions comprises entre l'Ouest (O) et le Nord (N). Les conditions de vague à l'approche de la plage de Zarautz sont quasi unidirectionnelles et peuvent présenter une variabilité longitudinale. La variabilité temporelle et spatiale des barres sableuses d'avant côte, a été étudiée à partir de l'analyse d'images vidéo enregistrées quotidiennement sur une période de deux années. Les résultats montrent que d'un point de vue hydrodynamique la plage se comporte la plupart du temps comme une plage ouverte. Toutefois, elle peut également présenter une circulation de type cellulaire au cours des événements de haute énergie. La morphologie de la plage présente une grande variabilité spatiale et temporelle. On remarque également des différences morphologiques notables entre la partie aménagée et la partie est de la plage. Pour étudier la réponse morphologique de la plage à des événements de haute énergie, des relevés topographiques ont été menés avant et après plusieurs tempêtes. Les courants d'arrachement, stables et persistants pendant des conditions énergétiques modérées à fortes peuvent éroder localement la zone intertidale de la plage. Dans des conditions de haute énergie et lors de marées de vives eaux le haut de plage et le cordon dunaire sont érodés. A l'inverse, lors de conditions de haute énergie qui coïncident avec des marées de mortes-eaux, l'évolution de l'estran, de l'arrière-plage et de la dune sont essentiellement contrôlées par les caractéristiques des vagues plutôt que par l'amplitude de la marée. Afin d'analyser et de compléter les résultats obtenus, une étude numérique a été réalisée à partir du code open source XBeach. En raison de l'absence de données de bathymétrie, le modèle d'assimilation de données Beachwizard a été utilisé afin d'estimer la bathymétrie à partir des images collectées par la station vidéo. La possibilité de forcer ce modèle avec des conditions de vagues variables le long de la limite du domaine de calcul a été mise en œuvre. Les résultats montrent que la prise en compte de conditions limites variables améliore la capacité du modèle à estimer la bathymétrie. Les tests de calibration du modèle XBeach révèlent que les résultats peuvent varier considérablement en fonction des paramètres choisis. Toutefois, les résultats du modèle XBeach semblent peu sensibles aux caractéristiques du spectre de vagues utilisé pour forcer le modèle. Une série de simulations ont été réalisées afin d'étudier le cluster de tempêtes de Février 2013 en analysant non seulement l'influence de la chronologie des différentes tempêtes mais aussi du niveau d'eau au cours de cette période. Ces simulations ont permis de mettre en évidence que les mouvements sédimentaires sont dominés par un transfert de sable de la dune vers la zone intertidale sans période de reconstruction de la dune. L'érosion des différentes sections de la plage est fortement corrélée au niveau d'eau. L'érosion de la dune et de l'arrière-plage ne se produit que lorsque les niveaux de marée élevés prévalent alors que la zone intertidale est érodée à marée basse. Il apparaît que l'impact des tempêtes sur la plage est beaucoup plus dépendant du niveau d'eau que de la chronologie des événements énergétiques au cours d'un cluster de tempêtes. / The aim of this study is to understand the response of engineered pocket beaches to storms. To that end, a series of video images, field topographical measurements and depth-averaged (2DH) process-based model have been used. The beach of Zarautz was chosen as a study site due to its wave climate characteristics and beach configuration. It is an embayed beach composed by two well defined regions, a dune system and an engineered section. The offshore wave climate is characterised by a low directional variability. The 95 % of the cases ranges from W to N directions. The high energetic events are seasonally variable. Most of the storms take place during winter and autumn. The wave climate at the beach of Zarautz is almost unidirectional and it presents certain alongshore variability. The temporal and spatial variability of nearshore sandbars, using daily video observations over 2 years was carried out. In general the beach acts as an open beach like circulatory system but it may present cellular and transitional circulation during high energy events. The nearshore sandbars evolution covers a wide range of temporal and spatial variability. Interestingly, the western engineered and more sheltered section of the beach sometimes exhibits a different beach state to that of the eastern section. To study the response of the beach to high energy events, systemically designed topographic surveys were undertaken before and after storm events. The location of the rip currents seems to play a role on the beach erosion. Static and persistent rips during moderate high energy conditions may erode locally the beach intertidal zone. During high energetic conditions and spring tides the beach backshore and dune area is eroded. Dune and backshore sections become important as they act as a buffer, preventing the foreshore erosion. On other hand, during high energetic conditions coinciding with neap tides, the evolution of the foreshore, backshore and dunes might be sensitive to the wave characteristics rather than to the tidal range. The findings obtained from the video images and field measurements were completed by means of the XBeach process based model. Due to the lack of a pre-storm bathymetry the XBeach-Beach Wizard model was used in order to infer the surfzone features. The possibility to force the model with non-uniform alongshore wave conditions was implemented. Results show that this new implementation improves the model skills. The XBeach calibration tests reveal that the results can vary considerably depending on the set of parameters chosen to run the model. Parameters such as short wave run-up, γ, γua, eps and hmin seem to be relevant for the model calibration. A series of storm impact simulations were performed. A chain transport mechanism was found in which the sand is transported from the dunes to the intertidal zone, and never in the other way around. The erosion of the different sections of the beach is highly related to the tidal level rather that to the wave power. The main differences in the beach response between the natural and engineered sections are related to the sand budget. The complete loss of the backshore sand makes the intertidal zone weak to the storms (the chain transport is interrupted). This scenario is only likely to happen at the engineered sector due to the narrow backshore and the absence of a dune system. Some tests were performed in order to relate the 'storm magnitude' to a certain value of beach erosion. These findings point out that, in general, the higher the storm power is, the larger is the beach erosion. However, the wave characteristics that define a given storm play an important role. Furthermore, in some cases a low power storm with high Hs and Tp can produce larger changes on the beach than a large storm with low Hs and Tp.
19

L'image malsaine : le trouble identitaire du cinéma Américain et ses modes de représentations dans les années quatre-vingt-dix / The unealthy image : the identity disorder of American cinema and his modes of representation in the 1990s

Le Gouez, Guillaume 24 March 2017 (has links)
Image hybride née de la contamination de l'image film par l'image vidéographique puis par l'image numérique, l'image malsaine du cinéma américain des années quatre-vingt-dix découle des modifications technologiques de la fin du XXe siècle. De fait, d'un point de vue esthétique, l'image malsaine, située habituellement aux abords de l'éthique de la représentation, n'est pas le résultat d'une sentence morale administrée par un regardant à un regardé, mais doit être traitée comme métaphore médicale capable de mettre en lumière les rapports dangereux qu'ont pu avoir, entre elles, les différentes natures d'images. L'image malsaine esquisse, dans ces rapports, dans ces mélanges qu'elle met en jeu, le cinéma du futur et réinvente la matière comme le langage du cinéma traditionnel américain. Modification du code génétique de l’image, naissance d'une représentation du corps numérique, ou encore volonté insatiable de mélanger toutes sortes de visuels pour créer des compositions hors normes, autant de facteurs qui poussent l'image cinématographique des années quatre-vingt-dix à se réinventer. L'image était déjà abjecte, grotesque, perverse voire pornographique dans le cinéma américain des années soixante-dix, elle deviendra, hybride, mutante et in fine malsaine lors des années quatre-vingt-dix. / Hybrid image born of the contamination of the film image by the video image and then by the digital image, the unhealthy image of the American cinema during the nineties stemmed from the technological changes of the end of the 20th century. Indeed, from an aesthetic point of view, the unhealthy image, usually close to the borders of the ethics of representation, cannot be the result of a moral sentence administered by a looking at a watched. It cannot follow from a judgment of value, difficult to analyze, but must be treated as a medical metaphor, able to highlight the dangerous relationships that may have existed at the end of the 20th century. Also, she sketches, by reinvent the material and language of the traditional American movie, the cinema of the future. Modification of the genetic code of the image, the birth of a representation of the digital body, or the insatiable des ire to mix all sorts of visuals to create out-of-the-ordinary compositions, a lot of factors that push the nineties cinematographic image to reinvent herself. The image was already abject, grotesque, perverse or even pornographic in the American cinema of the Seventies, it will become, hybrid, mutant, and definitely unhealthy in the nineties.
20

Experiential Sampling For Object Detection In Video

Paresh, A 05 1900 (has links)
The problem of object detection deals with determining whether an instance of a given class of object is present or not. There are robust, supervised learning based algorithms available for object detection in an image. These image object detectors (image-based object detectors) use characteristics learnt from the training samples to find object and non-object regions. The characteristics used are such that the detectors work under a variety of conditions and hence are very robust. Object detection in video can be performed by using such a detector on each frame of the video sequence. This approach checks for presence of an object around each pixel, at different scales. Such a frame-based approach completely ignores the temporal continuity inherent in the video. The detector declares presence of the object independent of what has happened in the past frames. Also, various visual cues such as motion and color, which give hints about the location of the object, are not used. The current work is aimed at building a generic framework for using a supervised learning based image object detector for video that exploits temporal continuity and the presence of various visual cues. We use temporal continuity and visual cues to speed up the detection and improve detection accuracy by considering past detection results. We propose a generic framework, based on Experiential Sampling [1], which considers temporal continuity and visual cues to focus on a relevant subset of each frame. We determine some key positions in each frame, called attention samples, and object detection is performed only at scales with these positions as centers. These key positions are statistical samples from a density function that is estimated based on various visual cues, past experience and temporal continuity. This density estimation is modeled as a Bayesian Filtering problem and is carried out using Sequential Monte Carlo methods (also known as Particle Filtering), where a density is represented by a weighted sample set. The experiential sampling framework is inspired by Neisser’s perceptual cycle [2] and Itti-Koch’s static visual attention model[3]. In this work, we first use Basic Experiential Sampling as presented in[1]for object detection in video and show its limitations. To overcome these limitations, we extend the framework to effectively combine top-down and bottom-up visual attention phenomena. We use learning based detector’s response, which is a top-down cue, along with visual cues to improve attention estimate. To effectively handle multiple objects, we maintain a minimum number of attention samples per object. We propose to use motion as an alert cue to reduce the delay in detecting new objects entering the field of view. We use an inhibition map to avoid revisiting already attended regions. Finally, we improve detection accuracy by using a particle filter based detection scheme [4], also known as Track Before Detect (TBD). In this scheme, we compute likelihood of presence of the object based on current and past frame data. This likelihood is shown to be approximately equal to the product of average sample weights over past frames. Our framework results in a significant reduction in overall computation required by the object detector, with an improvement in accuracy while retaining its robustness. This enables the use of learning based image object detectors in real time video applications which otherwise are computationally expensive. We demonstrate the usefulness of this framework for frontal face detection in video. We use Viola-Jones’ frontal face detector[5] and color and motion visual cues. We show results for various cases such as sequences with single object, multiple objects, distracting background, moving camera, changing illumination, objects entering/exiting the frame, crossing objects, objects with pose variation and sequences with scene change. The main contributions of the thesis are i) We give an experiential sampling formulation for object detection in video. Many concepts like attention point and attention density which are vague in[1] are precisely defined. ii) We combine detector’s response along with visual cues to estimate attention. This is inspired by a combination of top-down and bottom-up attention maps in visual attention models. To the best of our knowledge, this is used for the first time for object detection in video. iii) In case of multiple objects, we highlight the problem with sample based density representation and solve by maintaining a minimum number of attention samples per object. iv) For objects first detected by the learning based detector, we propose to use a TBD scheme for their subsequent detections along with the learning based detector. This improves accuracy compared to using the learning based detector alone. This thesis is organized as follows . Chapter 1: In this chapter we present a brief survey of related work and define our problem. . Chapter 2: We present an overview of biological models that have motivated our work. . Chapter 3: We give the experiential sampling formulation as in previous work [1], show results and discuss its limitations. . Chapter 4: In this chapter, which is on Enhanced Experiential Sampling, we suggest enhancements to overcome limitations of basic experiential sampling. We propose track-before-detect scheme to improve detection accuracy. . Chapter 5: We conclude the thesis and give possible directions for future work in this area. . Appendix A: A description of video database used in this thesis. . Appendix B: A list of commonly used abbreviations and notations.

Page generated in 0.0483 seconds