• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 334
  • 42
  • 19
  • 13
  • 10
  • 8
  • 4
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 522
  • 522
  • 239
  • 201
  • 162
  • 128
  • 109
  • 108
  • 104
  • 86
  • 84
  • 76
  • 73
  • 72
  • 69
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Extraction of Key-Frames from an Unstable Video Feed

Vempati, Nikhilesh 28 September 2017 (has links) (PDF)
The APOLI project deals with Automated Power Line Inspection using Highly-automated Unmanned Aerial Systems. Beside the Real-time damage assessment by on-board high-resolution image data exploitation a postprocessing of the video data is necessary. This Master Thesis deals with the implementation of an Isolator Detector Framework and a Work ow in the Automotive Data and Time-triggered Framework(ADTF) that loads a video direct from a camera or from a storage and extracts the Key Frames which contain objects of interest. This is done by the implementation of an object detection system using C++ and the creation of ADTF Filters that perform the task of detection of the objects of interest and extract the Key Frames using a supervised learning platform. The use case is the extraction of frames from video samples that contain Images of Isolators from Power Transmission Lines.
132

Automatic Eartag Recognition on Dairy Cows in Real Barn Environment

Ilestrand, Maja January 2017 (has links)
All dairy cows in Europe wear unique identification tags in their ears. These eartags are standardized and contains the cows identification numbers, today only used for visual identification by the farmer. The cow also needs to be identified by an automatic identification system connected to milk machines and other robotics used at the farm. Currently this is solved with a non-standardized radio transmitter which can be placed on different places on the cow and different receivers needs to be used on different farms. Other drawbacks with the currently used identification system are that it is expensive and unreliable. This thesis explores the possibility to replace this non standardized radio frequency based identification system with a standardized computer vision based system. The method proposed in this thesis uses a color threshold approach for detection, a flood fill approach followed by Hough transform and a projection method for segmentation and evaluates template matching, k-nearest neighbour and support vector machines as optical character recognition methods. The result from the thesis shows that the quality of the data used as input to the system is vital. By using good data, k-nearest neighbour, which showed the best results of the three OCR approaches, handles 98 % of the digits.
133

A Scalable Multimedia Content Processing Framework with Application to TV Shopping

Fleites, Fausto C 12 May 2014 (has links)
The advent of smart TVs has reshaped the TV-consumer interaction by combining TVs with mobile-like applications and access to the Internet. However, consumers are still unable to seamlessly interact with the contents being streamed. An example of such limitation is TV shopping, in which a consumer makes a purchase of a product or item displayed in the current TV show. Currently, consumers can only stop the current show and attempt to find a similar item in the Web or an actual store. It would be more convenient if the consumer could interact with the TV to purchase interesting items. Towards the realization of TV shopping, this dissertation proposes a scalable multimedia content processing framework. Two main challenges in TV shopping are addressed: the efficient detection of products in the content stream, and the retrieval of similar products given a consumer-selected product. The proposed framework consists of three components. The first component performs computational and temporal aware multimedia abstraction to select a reduced number of frames that summarize the important information in the video stream. By both reducing the number of frames and taking into account the computational cost of the subsequent detection phase, this component component allows the efficient detection of products in the stream. The second component realizes the detection phase. It executes scalable product detection using multi-cue optimization. Additional information cues are formulated into an optimization problem that allows the detection of complex products, i.e., those that do not have a rigid form and can appear in various poses. After the second component identifies products in the video stream, the consumer can select an interesting one for which similar ones must be located in a product database. To this end, the third component of the framework consists of an efficient, multi-dimensional, tree-based indexing method for multimedia databases. The proposed index mechanism serves as the backbone of the search. Moreover, it is able to efficiently bridge the semantic gap and perception subjectivity issues during the retrieval process to provide more relevant results.
134

SLAMIt A Sub-Map Based SLAM System : On-line creation of multi-leveled map

Holmquist, Karl January 2017 (has links)
In many situations after a big catastrophe such as the one in Fukushima, the disaster area is highly dangerous for humans to enter. It is in such environments that a semi-autonomous robot could limit the risks to humans by exploring and mapping the area on its own. This thesis intends to design and implement a software based SLAM system which has potential to run in real-time using a Kinect 2 sensor as input. The focus of the thesis has been to create a system which allows for efficient storage and representation of the map, in order to be able to explore large environments. This is done by separating the map in different abstraction levels corresponding to local maps connected by a global map. During the implementation, this structure has been kept in mind in order to allow modularity. This makes it possible for each sub-component in the system to be exchanged if needed. The thesis is broad in the sense that it uses techniques from distinct areas to solve the sub-problems that exist. Some examples being, object detection and classification, point-cloud registration and efficient 3D-based occupancy trees. / I många situationer efter en stor katastrof, såsom den i Fukushima, är området ytterst farligt för människor att vistas. Det är i sådana miljöer som semi-autonomarobotar kan begränsa risken för människor genom att utforska och kartlägga området på egen hand. Det här exjobbet fokuserar på att designa och implementera ett mjukvarubaserat SLAM system med real-tids potential användandes en Kinect 2 sensor. Exjobbet har fokuserat på att skapa ett system som tillåter effektiv lagring och representering av kartan för att tillåta utforskning utav stora områden. Det görs genom att separera kartan i olika abstraktionsnivåer, vilka korresponderar mot lokala kartor sammankopplade med en global karta. Strukturen av system har tagit hänsyn till under utvecklingen för att tillåta modularitet. Vilket gör det möjligt att byta ut komponenter i systemet. Det här exjobbet är brett i det avseende att det använder tekniker från flera olika områden för att lösa de sub-problem som finns. Några exempel är objektdetektion och klassificering, punkt-molnsregistrering och effektiva 3D-baserade okupationsträd. / Después de grandes catástrofes, cómo la reciente en Fukushima, está demasiado peligroso para permitir humanes a entrar. En estás situaciones estaría más preferible entrar con un robot semi-automático que puede explorar, crear un mapa de la ambiente y encontrar los riesgos que hay. Está obra intente de diseñar e implementar un sistema SLAM, con la potencial de crear está mapa en tiempo real, utilizando una camera Kinect 2. En el centro de la tesis está el diseño de una mapa que será eficiente alojar y manejar, para ser utilizado explorando áreas grandes. Se logra esto por la manera de la separación del mapa en distintas niveles de abstracción qué corresponde a mapas métricos locales y una mapa topológica que conecta estas. La estructura del sistema ha sido considerado para permitir utilizar varios tipos de sensores, además que permitir cambiar ciertas partes de la sistema. Esté tesis cobra distintas áreas cómo lo de detección de objetos, estimación de la posición del sistema, registrar nubes de puntos y alojamiento de 3D-mapas.
135

Machine Learning Methods for Visual Object Detection / Apprentissage machine pour la détection des objets

Hussain, Sabit ul 07 December 2011 (has links)
Le but de cette thèse est de développer des méthodes pratiques plus performantes pour la détection d'instances de classes d'objets de la vie quotidienne dans les images. Nous présentons une famille de détecteurs qui incorporent trois types d'indices visuelles performantes – histogrammes de gradients orientés (Histograms of Oriented Gradients, HOG), motifs locaux binaires (Local Binary Patterns, LBP) et motifs locaux ternaires (Local Ternary Patterns, LTP) – dans des méthodes de discrimination efficaces de type machine à vecteur de support latent (Latent SVM), sous deux régimes de réduction de dimension – moindres carrées partielles (Partial Least Squares, PLS) et sélection de variables par élagage de poids SVM (SVM Weight Truncation). Sur plusieurs jeux de données importantes, notamment ceux du PASCAL VOC2006 et VOC2007, INRIA Person et ETH Zurich, nous démontrons que nos méthodes améliorent l'état de l'art du domaine. Nos contributions principales sont : – Nous étudions l'indice visuelle LTP pour la détection d'objets. Nous démontrons que sa performance est globalement mieux que celle des indices bien établies HOG et LBP parce qu'elle permet d'encoder à la fois la texture locale de l'objet et sa forme globale, tout en étant résistante aux variations d'éclairage. Grâce à ces atouts, LTP fonctionne aussi bien pour les classes qui sont caractérisées principalement par leurs structures que pour celles qui sont caractérisées par leurs textures. En plus, nous démontrons que les indices HOG, LBP et LTP sont bien complémentaires, de sorte qu'un jeux d'indices étendu qui intègre tous les trois améliore encore la performance. – Les jeux d'indices visuelles performantes étant de dimension assez élevée, nous proposons deux méthodes de réduction de dimension afin d'améliorer leur vitesse et réduire leur utilisation de mémoire. La première, basée sur la projection moindres carrés partielles, diminue significativement le temps de formation des détecteurs linéaires, sans réduction de précision ni perte de vitesse d'exécution. La seconde, fondée sur la sélection de variables par l'élagage des poids du SVM, nous permet de réduire le nombre d'indices actives par un ordre de grandeur avec une réduction minime, voire même une petite augmentation, de la précision du détecteur. Malgré sa simplicité, cette méthode de sélection de variables surpasse toutes les autres approches que nous avons mis à l'essai. – Enfin, nous décrivons notre travail en cours sur une nouvelle variété d'indice visuelle – les « motifs locaux quantifiées » (Local Quantized Patterns, LQP). LQP généralise les indices existantes LBP / LTP en introduisant une étape de quantification vectorielle – ce qui permet une souplesse et une puissance analogue aux celles des approches de reconnaissance visuelle « sac de mots », qui sont basées sur la quantification des régions locales d'image considérablement plus grandes – sans perdre la simplicité et la rapidité qui caractérisent les approches motifs locales actuelles parce que les résultats de la quantification puissent être pré-compilés et stockés dans un tableau. LQP permet une augmentation considérable de la taille du support local de l'indice, et donc de sa puissance discriminatoire. Nos expériences indiquent qu'elle a la meilleure performance de toutes les indices visuelles testés, y compris HOG, LBP et LTP. / The goal of this thesis is to develop better practical methods for detecting common object classes in real world images. We present a family of object detectors that combine Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) features with efficient Latent SVM classifiers and effective dimensionality reduction and sparsification schemes to give state-of-the-art performance on several important datasets including PASCAL VOC2006 and VOC2007, INRIA Person and ETHZ. The three main contributions are as follows. Firstly, we pioneer the use of Local Ternary Pattern features for object detection, showing that LTP gives better overall performance than HOG and LBP, because it captures both rich local texture and object shape information while being resistant to variations in lighting conditions. It thus works well both for classes that are recognized mainly by their structure and ones that are recognized mainly by their textures. We also show that HOG, LBP and LTP complement one another, so that an extended feature set that incorporates all three of them gives further improvements in performance. Secondly, in order to tackle the speed and memory usage problems associated with high-dimensional modern feature sets, we propose two effective dimensionality reduction techniques. The first, feature projection using Partial Least Squares, allows detectors to be trained more rapidly with negligible loss of accuracy and no loss of run time speed for linear detectors. The second, feature selection using SVM weight truncation, allows active feature sets to be reduced in size by almost an order of magnitude with little or no loss, and often a small gain, in detector accuracy. Despite its simplicity, this feature selection scheme outperforms all of the other sparsity enforcing methods that we have tested. Lastly, we describe work in progress on Local Quantized Patterns (LQP), a generalized form of local pattern features that uses lookup table based vector quantization to provide local pattern style pixel neighbourhood codings that have the speed of LBP/LTP and some of the flexibility and power of traditional visual word representations. Our experiments show that LQP outperforms all of the other feature sets tested including HOG, LBP and LTP.
136

Detecção de objetos em vídeos usando misturas de modelos baseados em partes deformáveis obtidas de um conjunto de imagens / Object detection in video using mixtures of deformable part models obtained from a image set

Leissi Margarita Castaneda Leon 23 October 2012 (has links)
A detecção de objetos, pertencentes a uma determinada classe, em vídeos é de uma atividade amplamente estudada devido às aplicações potenciais que ela implica. Por exemplo, para vídeos obtidos por uma câmera estacionária, temos aplicações como segurança ou vigilância do tráfego, e por uma câmera dinâmica, para assistência ao condutor, entre outros. Na literatura, há diferentes métodos para tratar indistintamente cada um dos casos mencionados, e que consideram só imagens obtidas por um único tipo de câmera para treinar os detectores. Isto pode levar a uma baixa performance quando se aplica a técnica em vídeos de diferentes tipos de câmeras. O estado da arte na detecção de objetos de apenas uma classe, mostra uma tendência pelo uso de histogramas, treinamento supervisionado e, basicamente, seguem a seguinte estrutura: construção do modelo da classe de objeto, detecção de candidatos em uma imagem/quadro, e aplicação de uma medida sobre esses candidatos. Outra desvantagem observada é o uso de diferentes modelos para cada linha de visada de um objeto, gerando muitos modelos e, em alguns casos, um classificador para cada linha de visada. Nesta dissertação, abordamos o problema de detecção de objetos, usando um modelo da classe do objeto criada com um conjunto de dados de imagens estáticas e posteriormente usamos o modelo para detectar objetos na seqüência de imagens (vídeos) que foram coletadas a partir de câmeras estacionárias e dinâmicas, ou seja, num cenário totalmente diferente do usado para o treinamento. A criação do modelo é feita em uma fase de aprendizagem off-line, utilizando o conjunto de imagens PASCAL 2007. O modelo baseia-se em uma mistura de modelos baseados em partes deformáveis (MDPM), originalmente proposto por Felzenszwalb et al. (2010b) no âmbito da detecção de objetos em imagens. Não limitamos o modelo para uma determinada linha de visada. Foi elaborado um conjunto de experimentos que exploram o melhor número de componentes da mistura e o número de partes do modelo. Além disso, foi realizado um estudo comparativo de MDPMs simétricas e assimétricas. Testamos esse método para detectar objetos como pessoas e carros em vídeos obtidos por câmera estacionária e dinâmica. Nossos resultados não mostram apenas o bom desempenho da MDPM e melhores resultados que o estado da arte na detecção de objetos em vídeos obtidos por câmeras estacionárias ou dinâmicas, mas também mostram o melhor número de componentes da mistura e as partes para o modelo criado. Finalmente, os resultados mostram algumas diferenças entre as MDPMs simétricas e assimétricas na detecção de objetos em diferentes vídeos. / The problem of detecting objects that belong to a specific class of objects, in videos is a widely studied activity due to its potential applications. For example, for videos that have been taken from a stationary camera, we can mention applications such as security and traffic surveillance; when the video have been taken from a dynamic camera, a possible application is autonomous driving. The literature, presents several different approaches to treat indiscriminately with each of the cases mentioned, and only consider images obtained from a stationary or dynamic camera to train the detectors. These approaches can lead to poor performaces when the tecniques are used in sequences of images from different types of camera. The state of the art in the detection of objects that belong to a specific class shows a tendency to the use of histograms, supervised training and basically follows the structure: object class model construction, detection of candidates in the image/frame, and application of a distance measure to those candidates. Another disadvantage is that some approaches use several models for each point of view of the car, generating a lot of models and, in some cases, one classifier for each point of view. In this work, we approach the problem of object detection, using a model of the object class created with a dataset of static images and we use the model to detect objects in videos (sequence of images) that were collected from static and dynamic cameras, i.e., in a totally different setting than used for training. The creation of the model is done by an off-line learning phase, using an image database of cars in several points of view, PASCAL 2007. The model is based on a mixture of deformable part models (MDPM), originally proposed by Felzenszwalb et al. (2010b) for detection in static images. We do not limit the model for any specific viewpoint. A set of experiments was elaborated to explore the best number of components of the integration, as well as the number of parts of the model. In addition, we performed a comparative study of symmetric and asymmetric MDPMs. We evaluated the proposed method to detect people and cars in videos obtained by a static or a dynamic camera. Our results not only show good performance of MDPM and better results than the state of the art approches in object detection on videos obtained from a stationary, or dynamic, camera, but also show the best number of components of the integration and parts or the created object. Finally, results show differences between symmetric and asymmetric MDPMs in the detection of objects in different videos.
137

Edge Machine Learning for Animal Detection, Classification, and Tracking

Tydén, Amanda, Olsson, Sara January 2020 (has links)
A research field currently advancing is the use of machine learning on camera trap data, yet few explore deep learning for camera traps to be run in real-time. A camera trap has the purpose to capture images of bypassing animals and is traditionally based only on motion detection. This work integrates machine learning on the edge device to also perform object detection. Related research is brought up and model tests are performed with a focus on the trade-off regarding inference speed and model accuracy. Transfer learning is used to utilize pre-trained models and thus reduce training time and the amount of training data. Four models with slightly different architecture are compared to evaluate which model performs best for the use case. The models tested are SSD MobileNet V2, SSD Inception V2, and SSDLite MobileNet V2, SSD MobileNet V2 quantized. Since the client-side usage of the model, the SSD MobileNet V2 was finally selected due to a satisfying trade-off between inference speed and accuracy. Even though it is less accurate in its detections, its ability to detect more images per second makes it outperform the more accurate Inception network in object tracking. A contribution of this work is a light-weight tracking solution using tubelet proposal. This work further discusses the open set recognition problem, where just a few object classes are of interest while many others are present. The subject of open set recognition influences data collection and evaluation tests, it is however left for further work to research how to integrate support for open set recognition in object detection models. The proposed system handles detection, classification, and tracking of animals in the African savannah, and has potential for real usage as it produces meaningful events
138

Sensor Fusion for 3D Object Detection for Autonomous Vehicles

Massoud, Yahya 14 October 2021 (has links)
Thanks to the major advancements in hardware and computational power, sensor technology, and artificial intelligence, the race for fully autonomous driving systems is heating up. With a countless number of challenging conditions and driving scenarios, researchers are tackling the most challenging problems in driverless cars. One of the most critical components is the perception module, which enables an autonomous vehicle to "see" and "understand" its surrounding environment. Given that modern vehicles can have large number of sensors and available data streams, this thesis presents a deep learning-based framework that leverages multimodal data – i.e. sensor fusion, to perform the task of 3D object detection and localization. We provide an extensive review of the advancements of deep learning-based methods in computer vision, specifically in 2D and 3D object detection tasks. We also study the progress of the literature in both single-sensor and multi-sensor data fusion techniques. Furthermore, we present an in-depth explanation of our proposed approach that performs sensor fusion using input streams from LiDAR and Camera sensors, aiming to simultaneously perform 2D, 3D, and Bird’s Eye View detection. Our experiments highlight the importance of learnable data fusion mechanisms and multi-task learning, the impact of different CNN design decisions, speed-accuracy tradeoffs, and ways to deal with overfitting in multi-sensor data fusion frameworks.
139

Object detection for signal separation with different time-frequency representations

Strydom, Llewellyn January 2021 (has links)
The task of detecting and separating multiple radio-frequency signals in a wideband scenario has attracted much interest recently, especially from the cognitive radio community. Many successful approaches in this field have been based on machine learning and computer vision methods using the wideband signal spectrogram as an input feature. YOLO and R-CNN are deep learning-based object detection algorithms that have been used to obtain state-of-the-art results on several computer vision benchmark tests. In this work, YOLOv2 and Faster R-CNN are implemented, trained and tested, to solve the signal separation task. Previous signal separation research does not consider representations other than the spectrogram. Here, specific focus is placed on investigating different time-frequency representations based on the short-time Fourier transform. Results are presented in terms of traditional object detection metrics, with Faster R-CNN and YOLOv2 achieving mean average precision scores of up to 89.3% and 88.8% respectively. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2017. / Saab Grintek Defence / University of Pretoria / Electrical, Electronic and Computer Engineering / MEng (Computer Engineering) / Unrestricted
140

A Transfer Learning Approach to Object Detection Acceleration for Embedded Applications

Vance, Lauren M. 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Deep learning solutions to computer vision tasks have revolutionized many industries in recent years, but embedded systems have too many restrictions to take advantage of current state-of-the-art configurations. Typical embedded processor hardware configurations must meet very low power and memory constraints to maintain small and lightweight packaging, and the architectures of the current best deep learning models are too computationally-intensive for these hardware configurations. Current research shows that convolutional neural networks (CNNs) can be deployed with a few architectural modifications on Field-Programmable Gate Arrays (FPGAs) resulting in minimal loss of accuracy, similar or decreased processing speeds, and lower power consumption when compared to general-purpose Central Processing Units (CPUs) and Graphics Processing Units (GPUs). This research contributes further to these findings with the FPGA implementation of a YOLOv4 object detection model that was developed with the use of transfer learning. The transfer-learned model uses the weights of a model pre-trained on the MS-COCO dataset as a starting point then fine-tunes only the output layers for detection on more specific objects of five classes. The model architecture was then modified slightly for compatibility with the FPGA hardware using techniques such as weight quantization and replacing unsupported activation layer types. The model was deployed on three different hardware setups (CPU, GPU, FPGA) for inference on a test set of 100 images. It was found that the FPGA was able to achieve real-time inference speeds of 33.77 frames-per-second, a speedup of 7.74 frames-per-second when compared to GPU deployment. The model also consumed 96% less power than a GPU configuration with only approximately 4% average loss in accuracy across all 5 classes. The results are even more striking when compared to CPU deployment, with 131.7-times speedup in inference throughput. CPUs have long since been outperformed by GPUs for deep learning applications but are used in most embedded systems. These results further illustrate the advantages of FPGAs for deep learning inference on embedded systems even when transfer learning is used for an efficient end-to-end deployment process. This work advances current state-of-the-art with the implementation of a YOLOv4 object detection model developed with transfer learning for FPGA deployment.

Page generated in 0.1058 seconds