Spelling suggestions: "subject:"abject detection."" "subject:"6bject detection.""
131 |
On the Construction of an Automatic Traffic Sign Recognition SystemJonsson, Fredrik January 2017 (has links)
This thesis proposes an automatic road sign recognition system, including all steps from the initial detection of road signs from a digital image to the final recognition step that determines the class of the sign. We develop a Bayesian approach for image segmentation in the detection step using colour information in the HSV (Hue, Saturation and Value) colour space. The image segmentation uses a probability model which is constructed based on manually extracted data on colours of road signs collected from real images. We show how the colour data is fitted using mixture multivariate normal distributions, where for the case of parameter estimation Gibbs sampling is used. The fitted models are then used to find the (posterior) probability of a pixel colour to belong to a road sign using the Bayesian approach. Following the image segmentation, regions of interest (ROIs) are detected by using the Maximally Stable Extremal Region (MSER) algorithm, followed by classification of the ROIs using a cascade of classifiers. Synthetic images are used in training of the classifiers, by applying various random distortions to a set of template images constituting most road signs in Sweden, and we demonstrate that the construction of such synthetic images provides satisfactory recognition rates. We focus on a large set of the signs on the Swedish road network, including almost 200 road signs. We use classification models such as the Support Vector Machine (SVM), and Random Forest (RF), where for features we use Histogram of Oriented Gradients (HOG).
|
132 |
Dynamic Data-Driven Visual Surveillance of Human Crowds via Cooperative Unmanned VehiclesMinaeian, Sara, Minaeian, Sara January 2017 (has links)
Visual surveillance of human crowds in a dynamic environment has attracted a great amount of computer vision research efforts in recent years. Moving object detection, which conventionally includes motion segmentation and optionally, object classification, is the first major task for any visual surveillance application. After detecting the targets, estimation of their geo-locations is needed to create the same reference coordinate system for them for higher-level decision-making. Depending on the required fidelity of decision, multi-target data association may be also needed at higher levels to differentiate multiple targets in a series of frames. Applying all these vision-based algorithms to a crowd surveillance system (a major application studied in this dissertation) using a team of cooperative unmanned vehicles (UVs), introduces new challenges to the problem. Since the visual sensors move with the UVs, and thus the targets and the environment are dynamic, it adds to the complexity and uncertainty of the video processing. Moreover, the limited onboard computation resources require more efficient algorithms to be proposed. Responding to these challenges, the goal of this dissertation is to design and develop an effective and efficient visual surveillance system based on dynamic data driven application system (DDDAS) paradigm to be used by the cooperative UVs for autonomous crowd control and border patrol. The proposed visual surveillance system includes different modules: 1) a motion detection module, in which a new method for detecting multiple moving objects, based on sliding window is proposed to segment the moving foreground using the moving camera onboard the unmanned aerial vehicle (UAV); 2) a target recognition module, in which a customized method based on histogram-of-oriented-gradients is applied to classify the human targets using the onboard camera of unmanned ground vehicle (UGV); 3) a target geo-localization module, in which a new moving-landmark-based method is proposed for estimating the geo-location of the detected crowd from the UAV, while a heuristic method based on triangulation is applied for geo-locating the detected individuals via the UGV; and 4) a multi-target data association module, in which the affinity score is dynamically adjusted to comply with the changing dispersion of the detected targets over successive frames. In this dissertation, a cooperative team of one UAV and multiple UGVs with onboard visual sensors is used to take advantage of the complementary characteristics (e.g. different fidelities and view perspectives) of these UVs for crowd surveillance. The DDDAS paradigm is also applied toward these vision-based modules, where the computational and instrumentation aspects of the application system are unified for more accurate or efficient analysis according to the scenario. To illustrate and demonstrate the proposed visual surveillance system, aerial and ground video sequences from the UVs, as well as simulation models are developed, and experiments are conducted using them. The experimental results on both developed videos and literature datasets reveal the effectiveness and efficiency of the proposed modules and their promising performance in the considered crowd surveillance application.
|
133 |
Extraction of Key-Frames from an Unstable Video FeedVempati, Nikhilesh 28 September 2017 (has links) (PDF)
The APOLI project deals with Automated Power Line Inspection using Highly-automated Unmanned Aerial Systems. Beside the Real-time damage assessment by on-board high-resolution image data exploitation a postprocessing of the video data is necessary. This Master Thesis deals with the implementation of an Isolator Detector Framework and a Work ow in the Automotive Data and Time-triggered Framework(ADTF) that loads a video direct from a camera or from a storage and extracts the Key Frames which contain objects of interest. This is done by the implementation of an object detection system using C++ and the creation of ADTF Filters that perform the task of detection of the objects of interest and extract the Key Frames using a supervised learning platform. The use case is the extraction of frames from video samples that contain Images of Isolators from Power Transmission Lines.
|
134 |
Automatic Eartag Recognition on Dairy Cows in Real Barn EnvironmentIlestrand, Maja January 2017 (has links)
All dairy cows in Europe wear unique identification tags in their ears. These eartags are standardized and contains the cows identification numbers, today only used for visual identification by the farmer. The cow also needs to be identified by an automatic identification system connected to milk machines and other robotics used at the farm. Currently this is solved with a non-standardized radio transmitter which can be placed on different places on the cow and different receivers needs to be used on different farms. Other drawbacks with the currently used identification system are that it is expensive and unreliable. This thesis explores the possibility to replace this non standardized radio frequency based identification system with a standardized computer vision based system. The method proposed in this thesis uses a color threshold approach for detection, a flood fill approach followed by Hough transform and a projection method for segmentation and evaluates template matching, k-nearest neighbour and support vector machines as optical character recognition methods. The result from the thesis shows that the quality of the data used as input to the system is vital. By using good data, k-nearest neighbour, which showed the best results of the three OCR approaches, handles 98 % of the digits.
|
135 |
A Scalable Multimedia Content Processing Framework with Application to TV ShoppingFleites, Fausto C 12 May 2014 (has links)
The advent of smart TVs has reshaped the TV-consumer interaction by combining TVs with mobile-like applications and access to the Internet. However, consumers are still unable to seamlessly interact with the contents being streamed. An example of such limitation is TV shopping, in which a consumer makes a purchase of a product or item displayed in the current TV show. Currently, consumers can only stop the current show and attempt to find a similar item in the Web or an actual store. It would be more convenient if the consumer could interact with the TV to purchase interesting items.
Towards the realization of TV shopping, this dissertation proposes a scalable multimedia content processing framework. Two main challenges in TV shopping are addressed: the efficient detection of products in the content stream, and the retrieval of similar products given a consumer-selected product. The proposed framework consists of three components. The first component performs computational and temporal aware multimedia abstraction to select a reduced number of frames that summarize the important information in the video stream. By both reducing the number of frames and taking into account the computational cost of the subsequent detection phase, this component component allows the efficient detection of products in the stream. The second component realizes the detection phase. It executes scalable product detection using multi-cue optimization. Additional information cues are formulated into an optimization problem that allows the detection of complex products, i.e., those that do not have a rigid form and can appear in various poses. After the second component identifies products in the video stream, the consumer can select an interesting one for which similar ones must be located in a product database. To this end, the third component of the framework consists of an efficient, multi-dimensional, tree-based indexing method for multimedia databases. The proposed index mechanism serves as the backbone of the search. Moreover, it is able to efficiently bridge the semantic gap and perception subjectivity issues during the retrieval process to provide more relevant results.
|
136 |
SLAMIt A Sub-Map Based SLAM System : On-line creation of multi-leveled mapHolmquist, Karl January 2017 (has links)
In many situations after a big catastrophe such as the one in Fukushima, the disaster area is highly dangerous for humans to enter. It is in such environments that a semi-autonomous robot could limit the risks to humans by exploring and mapping the area on its own. This thesis intends to design and implement a software based SLAM system which has potential to run in real-time using a Kinect 2 sensor as input. The focus of the thesis has been to create a system which allows for efficient storage and representation of the map, in order to be able to explore large environments. This is done by separating the map in different abstraction levels corresponding to local maps connected by a global map. During the implementation, this structure has been kept in mind in order to allow modularity. This makes it possible for each sub-component in the system to be exchanged if needed. The thesis is broad in the sense that it uses techniques from distinct areas to solve the sub-problems that exist. Some examples being, object detection and classification, point-cloud registration and efficient 3D-based occupancy trees. / I många situationer efter en stor katastrof, såsom den i Fukushima, är området ytterst farligt för människor att vistas. Det är i sådana miljöer som semi-autonomarobotar kan begränsa risken för människor genom att utforska och kartlägga området på egen hand. Det här exjobbet fokuserar på att designa och implementera ett mjukvarubaserat SLAM system med real-tids potential användandes en Kinect 2 sensor. Exjobbet har fokuserat på att skapa ett system som tillåter effektiv lagring och representering av kartan för att tillåta utforskning utav stora områden. Det görs genom att separera kartan i olika abstraktionsnivåer, vilka korresponderar mot lokala kartor sammankopplade med en global karta. Strukturen av system har tagit hänsyn till under utvecklingen för att tillåta modularitet. Vilket gör det möjligt att byta ut komponenter i systemet. Det här exjobbet är brett i det avseende att det använder tekniker från flera olika områden för att lösa de sub-problem som finns. Några exempel är objektdetektion och klassificering, punkt-molnsregistrering och effektiva 3D-baserade okupationsträd. / Después de grandes catástrofes, cómo la reciente en Fukushima, está demasiado peligroso para permitir humanes a entrar. En estás situaciones estaría más preferible entrar con un robot semi-automático que puede explorar, crear un mapa de la ambiente y encontrar los riesgos que hay. Está obra intente de diseñar e implementar un sistema SLAM, con la potencial de crear está mapa en tiempo real, utilizando una camera Kinect 2. En el centro de la tesis está el diseño de una mapa que será eficiente alojar y manejar, para ser utilizado explorando áreas grandes. Se logra esto por la manera de la separación del mapa en distintas niveles de abstracción qué corresponde a mapas métricos locales y una mapa topológica que conecta estas. La estructura del sistema ha sido considerado para permitir utilizar varios tipos de sensores, además que permitir cambiar ciertas partes de la sistema. Esté tesis cobra distintas áreas cómo lo de detección de objetos, estimación de la posición del sistema, registrar nubes de puntos y alojamiento de 3D-mapas.
|
137 |
Machine Learning Methods for Visual Object Detection / Apprentissage machine pour la détection des objetsHussain, Sabit ul 07 December 2011 (has links)
Le but de cette thèse est de développer des méthodes pratiques plus performantes pour la détection d'instances de classes d'objets de la vie quotidienne dans les images. Nous présentons une famille de détecteurs qui incorporent trois types d'indices visuelles performantes – histogrammes de gradients orientés (Histograms of Oriented Gradients, HOG), motifs locaux binaires (Local Binary Patterns, LBP) et motifs locaux ternaires (Local Ternary Patterns, LTP) – dans des méthodes de discrimination efficaces de type machine à vecteur de support latent (Latent SVM), sous deux régimes de réduction de dimension – moindres carrées partielles (Partial Least Squares, PLS) et sélection de variables par élagage de poids SVM (SVM Weight Truncation). Sur plusieurs jeux de données importantes, notamment ceux du PASCAL VOC2006 et VOC2007, INRIA Person et ETH Zurich, nous démontrons que nos méthodes améliorent l'état de l'art du domaine. Nos contributions principales sont : – Nous étudions l'indice visuelle LTP pour la détection d'objets. Nous démontrons que sa performance est globalement mieux que celle des indices bien établies HOG et LBP parce qu'elle permet d'encoder à la fois la texture locale de l'objet et sa forme globale, tout en étant résistante aux variations d'éclairage. Grâce à ces atouts, LTP fonctionne aussi bien pour les classes qui sont caractérisées principalement par leurs structures que pour celles qui sont caractérisées par leurs textures. En plus, nous démontrons que les indices HOG, LBP et LTP sont bien complémentaires, de sorte qu'un jeux d'indices étendu qui intègre tous les trois améliore encore la performance. – Les jeux d'indices visuelles performantes étant de dimension assez élevée, nous proposons deux méthodes de réduction de dimension afin d'améliorer leur vitesse et réduire leur utilisation de mémoire. La première, basée sur la projection moindres carrés partielles, diminue significativement le temps de formation des détecteurs linéaires, sans réduction de précision ni perte de vitesse d'exécution. La seconde, fondée sur la sélection de variables par l'élagage des poids du SVM, nous permet de réduire le nombre d'indices actives par un ordre de grandeur avec une réduction minime, voire même une petite augmentation, de la précision du détecteur. Malgré sa simplicité, cette méthode de sélection de variables surpasse toutes les autres approches que nous avons mis à l'essai. – Enfin, nous décrivons notre travail en cours sur une nouvelle variété d'indice visuelle – les « motifs locaux quantifiées » (Local Quantized Patterns, LQP). LQP généralise les indices existantes LBP / LTP en introduisant une étape de quantification vectorielle – ce qui permet une souplesse et une puissance analogue aux celles des approches de reconnaissance visuelle « sac de mots », qui sont basées sur la quantification des régions locales d'image considérablement plus grandes – sans perdre la simplicité et la rapidité qui caractérisent les approches motifs locales actuelles parce que les résultats de la quantification puissent être pré-compilés et stockés dans un tableau. LQP permet une augmentation considérable de la taille du support local de l'indice, et donc de sa puissance discriminatoire. Nos expériences indiquent qu'elle a la meilleure performance de toutes les indices visuelles testés, y compris HOG, LBP et LTP. / The goal of this thesis is to develop better practical methods for detecting common object classes in real world images. We present a family of object detectors that combine Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) features with efficient Latent SVM classifiers and effective dimensionality reduction and sparsification schemes to give state-of-the-art performance on several important datasets including PASCAL VOC2006 and VOC2007, INRIA Person and ETHZ. The three main contributions are as follows. Firstly, we pioneer the use of Local Ternary Pattern features for object detection, showing that LTP gives better overall performance than HOG and LBP, because it captures both rich local texture and object shape information while being resistant to variations in lighting conditions. It thus works well both for classes that are recognized mainly by their structure and ones that are recognized mainly by their textures. We also show that HOG, LBP and LTP complement one another, so that an extended feature set that incorporates all three of them gives further improvements in performance. Secondly, in order to tackle the speed and memory usage problems associated with high-dimensional modern feature sets, we propose two effective dimensionality reduction techniques. The first, feature projection using Partial Least Squares, allows detectors to be trained more rapidly with negligible loss of accuracy and no loss of run time speed for linear detectors. The second, feature selection using SVM weight truncation, allows active feature sets to be reduced in size by almost an order of magnitude with little or no loss, and often a small gain, in detector accuracy. Despite its simplicity, this feature selection scheme outperforms all of the other sparsity enforcing methods that we have tested. Lastly, we describe work in progress on Local Quantized Patterns (LQP), a generalized form of local pattern features that uses lookup table based vector quantization to provide local pattern style pixel neighbourhood codings that have the speed of LBP/LTP and some of the flexibility and power of traditional visual word representations. Our experiments show that LQP outperforms all of the other feature sets tested including HOG, LBP and LTP.
|
138 |
Detecção de objetos em vídeos usando misturas de modelos baseados em partes deformáveis obtidas de um conjunto de imagens / Object detection in video using mixtures of deformable part models obtained from a image setLeissi Margarita Castaneda Leon 23 October 2012 (has links)
A detecção de objetos, pertencentes a uma determinada classe, em vídeos é de uma atividade amplamente estudada devido às aplicações potenciais que ela implica. Por exemplo, para vídeos obtidos por uma câmera estacionária, temos aplicações como segurança ou vigilância do tráfego, e por uma câmera dinâmica, para assistência ao condutor, entre outros. Na literatura, há diferentes métodos para tratar indistintamente cada um dos casos mencionados, e que consideram só imagens obtidas por um único tipo de câmera para treinar os detectores. Isto pode levar a uma baixa performance quando se aplica a técnica em vídeos de diferentes tipos de câmeras. O estado da arte na detecção de objetos de apenas uma classe, mostra uma tendência pelo uso de histogramas, treinamento supervisionado e, basicamente, seguem a seguinte estrutura: construção do modelo da classe de objeto, detecção de candidatos em uma imagem/quadro, e aplicação de uma medida sobre esses candidatos. Outra desvantagem observada é o uso de diferentes modelos para cada linha de visada de um objeto, gerando muitos modelos e, em alguns casos, um classificador para cada linha de visada. Nesta dissertação, abordamos o problema de detecção de objetos, usando um modelo da classe do objeto criada com um conjunto de dados de imagens estáticas e posteriormente usamos o modelo para detectar objetos na seqüência de imagens (vídeos) que foram coletadas a partir de câmeras estacionárias e dinâmicas, ou seja, num cenário totalmente diferente do usado para o treinamento. A criação do modelo é feita em uma fase de aprendizagem off-line, utilizando o conjunto de imagens PASCAL 2007. O modelo baseia-se em uma mistura de modelos baseados em partes deformáveis (MDPM), originalmente proposto por Felzenszwalb et al. (2010b) no âmbito da detecção de objetos em imagens. Não limitamos o modelo para uma determinada linha de visada. Foi elaborado um conjunto de experimentos que exploram o melhor número de componentes da mistura e o número de partes do modelo. Além disso, foi realizado um estudo comparativo de MDPMs simétricas e assimétricas. Testamos esse método para detectar objetos como pessoas e carros em vídeos obtidos por câmera estacionária e dinâmica. Nossos resultados não mostram apenas o bom desempenho da MDPM e melhores resultados que o estado da arte na detecção de objetos em vídeos obtidos por câmeras estacionárias ou dinâmicas, mas também mostram o melhor número de componentes da mistura e as partes para o modelo criado. Finalmente, os resultados mostram algumas diferenças entre as MDPMs simétricas e assimétricas na detecção de objetos em diferentes vídeos. / The problem of detecting objects that belong to a specific class of objects, in videos is a widely studied activity due to its potential applications. For example, for videos that have been taken from a stationary camera, we can mention applications such as security and traffic surveillance; when the video have been taken from a dynamic camera, a possible application is autonomous driving. The literature, presents several different approaches to treat indiscriminately with each of the cases mentioned, and only consider images obtained from a stationary or dynamic camera to train the detectors. These approaches can lead to poor performaces when the tecniques are used in sequences of images from different types of camera. The state of the art in the detection of objects that belong to a specific class shows a tendency to the use of histograms, supervised training and basically follows the structure: object class model construction, detection of candidates in the image/frame, and application of a distance measure to those candidates. Another disadvantage is that some approaches use several models for each point of view of the car, generating a lot of models and, in some cases, one classifier for each point of view. In this work, we approach the problem of object detection, using a model of the object class created with a dataset of static images and we use the model to detect objects in videos (sequence of images) that were collected from static and dynamic cameras, i.e., in a totally different setting than used for training. The creation of the model is done by an off-line learning phase, using an image database of cars in several points of view, PASCAL 2007. The model is based on a mixture of deformable part models (MDPM), originally proposed by Felzenszwalb et al. (2010b) for detection in static images. We do not limit the model for any specific viewpoint. A set of experiments was elaborated to explore the best number of components of the integration, as well as the number of parts of the model. In addition, we performed a comparative study of symmetric and asymmetric MDPMs. We evaluated the proposed method to detect people and cars in videos obtained by a static or a dynamic camera. Our results not only show good performance of MDPM and better results than the state of the art approches in object detection on videos obtained from a stationary, or dynamic, camera, but also show the best number of components of the integration and parts or the created object. Finally, results show differences between symmetric and asymmetric MDPMs in the detection of objects in different videos.
|
139 |
Edge Machine Learning for Animal Detection, Classification, and TrackingTydén, Amanda, Olsson, Sara January 2020 (has links)
A research field currently advancing is the use of machine learning on camera trap data, yet few explore deep learning for camera traps to be run in real-time. A camera trap has the purpose to capture images of bypassing animals and is traditionally based only on motion detection. This work integrates machine learning on the edge device to also perform object detection. Related research is brought up and model tests are performed with a focus on the trade-off regarding inference speed and model accuracy. Transfer learning is used to utilize pre-trained models and thus reduce training time and the amount of training data. Four models with slightly different architecture are compared to evaluate which model performs best for the use case. The models tested are SSD MobileNet V2, SSD Inception V2, and SSDLite MobileNet V2, SSD MobileNet V2 quantized. Since the client-side usage of the model, the SSD MobileNet V2 was finally selected due to a satisfying trade-off between inference speed and accuracy. Even though it is less accurate in its detections, its ability to detect more images per second makes it outperform the more accurate Inception network in object tracking. A contribution of this work is a light-weight tracking solution using tubelet proposal. This work further discusses the open set recognition problem, where just a few object classes are of interest while many others are present. The subject of open set recognition influences data collection and evaluation tests, it is however left for further work to research how to integrate support for open set recognition in object detection models. The proposed system handles detection, classification, and tracking of animals in the African savannah, and has potential for real usage as it produces meaningful events
|
140 |
Sensor Fusion for 3D Object Detection for Autonomous VehiclesMassoud, Yahya 14 October 2021 (has links)
Thanks to the major advancements in hardware and computational power, sensor technology, and artificial intelligence, the race for fully autonomous driving systems is heating up. With a countless number of challenging conditions and driving
scenarios, researchers are tackling the most challenging problems in driverless cars.
One of the most critical components is the perception module, which enables an autonomous vehicle to "see" and "understand" its surrounding environment. Given
that modern vehicles can have large number of sensors and available data streams,
this thesis presents a deep learning-based framework that leverages multimodal
data – i.e. sensor fusion, to perform the task of 3D object detection and localization.
We provide an extensive review of the advancements of deep learning-based
methods in computer vision, specifically in 2D and 3D object detection tasks. We also
study the progress of the literature in both single-sensor and multi-sensor data fusion techniques. Furthermore, we present an in-depth explanation of our proposed
approach that performs sensor fusion using input streams from LiDAR and Camera
sensors, aiming to simultaneously perform 2D, 3D, and Bird’s Eye View detection.
Our experiments highlight the importance of learnable data fusion mechanisms and
multi-task learning, the impact of different CNN design decisions, speed-accuracy
tradeoffs, and ways to deal with overfitting in multi-sensor data fusion frameworks.
|
Page generated in 0.1409 seconds