• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 2
  • Tagged with
  • 6
  • 6
  • 6
  • 6
  • 5
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Detecção de objetos em vídeos usando misturas de modelos baseados em partes deformáveis obtidas de um conjunto de imagens / Object detection in video using mixtures of deformable part models obtained from a image set

Castaneda Leon, Leissi Margarita 23 October 2012 (has links)
A detecção de objetos, pertencentes a uma determinada classe, em vídeos é de uma atividade amplamente estudada devido às aplicações potenciais que ela implica. Por exemplo, para vídeos obtidos por uma câmera estacionária, temos aplicações como segurança ou vigilância do tráfego, e por uma câmera dinâmica, para assistência ao condutor, entre outros. Na literatura, há diferentes métodos para tratar indistintamente cada um dos casos mencionados, e que consideram só imagens obtidas por um único tipo de câmera para treinar os detectores. Isto pode levar a uma baixa performance quando se aplica a técnica em vídeos de diferentes tipos de câmeras. O estado da arte na detecção de objetos de apenas uma classe, mostra uma tendência pelo uso de histogramas, treinamento supervisionado e, basicamente, seguem a seguinte estrutura: construção do modelo da classe de objeto, detecção de candidatos em uma imagem/quadro, e aplicação de uma medida sobre esses candidatos. Outra desvantagem observada é o uso de diferentes modelos para cada linha de visada de um objeto, gerando muitos modelos e, em alguns casos, um classificador para cada linha de visada. Nesta dissertação, abordamos o problema de detecção de objetos, usando um modelo da classe do objeto criada com um conjunto de dados de imagens estáticas e posteriormente usamos o modelo para detectar objetos na seqüência de imagens (vídeos) que foram coletadas a partir de câmeras estacionárias e dinâmicas, ou seja, num cenário totalmente diferente do usado para o treinamento. A criação do modelo é feita em uma fase de aprendizagem off-line, utilizando o conjunto de imagens PASCAL 2007. O modelo baseia-se em uma mistura de modelos baseados em partes deformáveis (MDPM), originalmente proposto por Felzenszwalb et al. (2010b) no âmbito da detecção de objetos em imagens. Não limitamos o modelo para uma determinada linha de visada. Foi elaborado um conjunto de experimentos que exploram o melhor número de componentes da mistura e o número de partes do modelo. Além disso, foi realizado um estudo comparativo de MDPMs simétricas e assimétricas. Testamos esse método para detectar objetos como pessoas e carros em vídeos obtidos por câmera estacionária e dinâmica. Nossos resultados não mostram apenas o bom desempenho da MDPM e melhores resultados que o estado da arte na detecção de objetos em vídeos obtidos por câmeras estacionárias ou dinâmicas, mas também mostram o melhor número de componentes da mistura e as partes para o modelo criado. Finalmente, os resultados mostram algumas diferenças entre as MDPMs simétricas e assimétricas na detecção de objetos em diferentes vídeos. / The problem of detecting objects that belong to a specific class of objects, in videos is a widely studied activity due to its potential applications. For example, for videos that have been taken from a stationary camera, we can mention applications such as security and traffic surveillance; when the video have been taken from a dynamic camera, a possible application is autonomous driving. The literature, presents several different approaches to treat indiscriminately with each of the cases mentioned, and only consider images obtained from a stationary or dynamic camera to train the detectors. These approaches can lead to poor performaces when the tecniques are used in sequences of images from different types of camera. The state of the art in the detection of objects that belong to a specific class shows a tendency to the use of histograms, supervised training and basically follows the structure: object class model construction, detection of candidates in the image/frame, and application of a distance measure to those candidates. Another disadvantage is that some approaches use several models for each point of view of the car, generating a lot of models and, in some cases, one classifier for each point of view. In this work, we approach the problem of object detection, using a model of the object class created with a dataset of static images and we use the model to detect objects in videos (sequence of images) that were collected from static and dynamic cameras, i.e., in a totally different setting than used for training. The creation of the model is done by an off-line learning phase, using an image database of cars in several points of view, PASCAL 2007. The model is based on a mixture of deformable part models (MDPM), originally proposed by Felzenszwalb et al. (2010b) for detection in static images. We do not limit the model for any specific viewpoint. A set of experiments was elaborated to explore the best number of components of the integration, as well as the number of parts of the model. In addition, we performed a comparative study of symmetric and asymmetric MDPMs. We evaluated the proposed method to detect people and cars in videos obtained by a static or a dynamic camera. Our results not only show good performance of MDPM and better results than the state of the art approches in object detection on videos obtained from a stationary, or dynamic, camera, but also show the best number of components of the integration and parts or the created object. Finally, results show differences between symmetric and asymmetric MDPMs in the detection of objects in different videos.
2

Detecção de objetos em vídeos usando misturas de modelos baseados em partes deformáveis obtidas de um conjunto de imagens / Object detection in video using mixtures of deformable part models obtained from a image set

Leissi Margarita Castaneda Leon 23 October 2012 (has links)
A detecção de objetos, pertencentes a uma determinada classe, em vídeos é de uma atividade amplamente estudada devido às aplicações potenciais que ela implica. Por exemplo, para vídeos obtidos por uma câmera estacionária, temos aplicações como segurança ou vigilância do tráfego, e por uma câmera dinâmica, para assistência ao condutor, entre outros. Na literatura, há diferentes métodos para tratar indistintamente cada um dos casos mencionados, e que consideram só imagens obtidas por um único tipo de câmera para treinar os detectores. Isto pode levar a uma baixa performance quando se aplica a técnica em vídeos de diferentes tipos de câmeras. O estado da arte na detecção de objetos de apenas uma classe, mostra uma tendência pelo uso de histogramas, treinamento supervisionado e, basicamente, seguem a seguinte estrutura: construção do modelo da classe de objeto, detecção de candidatos em uma imagem/quadro, e aplicação de uma medida sobre esses candidatos. Outra desvantagem observada é o uso de diferentes modelos para cada linha de visada de um objeto, gerando muitos modelos e, em alguns casos, um classificador para cada linha de visada. Nesta dissertação, abordamos o problema de detecção de objetos, usando um modelo da classe do objeto criada com um conjunto de dados de imagens estáticas e posteriormente usamos o modelo para detectar objetos na seqüência de imagens (vídeos) que foram coletadas a partir de câmeras estacionárias e dinâmicas, ou seja, num cenário totalmente diferente do usado para o treinamento. A criação do modelo é feita em uma fase de aprendizagem off-line, utilizando o conjunto de imagens PASCAL 2007. O modelo baseia-se em uma mistura de modelos baseados em partes deformáveis (MDPM), originalmente proposto por Felzenszwalb et al. (2010b) no âmbito da detecção de objetos em imagens. Não limitamos o modelo para uma determinada linha de visada. Foi elaborado um conjunto de experimentos que exploram o melhor número de componentes da mistura e o número de partes do modelo. Além disso, foi realizado um estudo comparativo de MDPMs simétricas e assimétricas. Testamos esse método para detectar objetos como pessoas e carros em vídeos obtidos por câmera estacionária e dinâmica. Nossos resultados não mostram apenas o bom desempenho da MDPM e melhores resultados que o estado da arte na detecção de objetos em vídeos obtidos por câmeras estacionárias ou dinâmicas, mas também mostram o melhor número de componentes da mistura e as partes para o modelo criado. Finalmente, os resultados mostram algumas diferenças entre as MDPMs simétricas e assimétricas na detecção de objetos em diferentes vídeos. / The problem of detecting objects that belong to a specific class of objects, in videos is a widely studied activity due to its potential applications. For example, for videos that have been taken from a stationary camera, we can mention applications such as security and traffic surveillance; when the video have been taken from a dynamic camera, a possible application is autonomous driving. The literature, presents several different approaches to treat indiscriminately with each of the cases mentioned, and only consider images obtained from a stationary or dynamic camera to train the detectors. These approaches can lead to poor performaces when the tecniques are used in sequences of images from different types of camera. The state of the art in the detection of objects that belong to a specific class shows a tendency to the use of histograms, supervised training and basically follows the structure: object class model construction, detection of candidates in the image/frame, and application of a distance measure to those candidates. Another disadvantage is that some approaches use several models for each point of view of the car, generating a lot of models and, in some cases, one classifier for each point of view. In this work, we approach the problem of object detection, using a model of the object class created with a dataset of static images and we use the model to detect objects in videos (sequence of images) that were collected from static and dynamic cameras, i.e., in a totally different setting than used for training. The creation of the model is done by an off-line learning phase, using an image database of cars in several points of view, PASCAL 2007. The model is based on a mixture of deformable part models (MDPM), originally proposed by Felzenszwalb et al. (2010b) for detection in static images. We do not limit the model for any specific viewpoint. A set of experiments was elaborated to explore the best number of components of the integration, as well as the number of parts of the model. In addition, we performed a comparative study of symmetric and asymmetric MDPMs. We evaluated the proposed method to detect people and cars in videos obtained by a static or a dynamic camera. Our results not only show good performance of MDPM and better results than the state of the art approches in object detection on videos obtained from a stationary, or dynamic, camera, but also show the best number of components of the integration and parts or the created object. Finally, results show differences between symmetric and asymmetric MDPMs in the detection of objects in different videos.
3

Automatic vertebrae detection and labeling in sagittal magnetic resonance images

Andersson, Daniel January 2015 (has links)
Radiologists are often plagued by limited time for completing their work, with an ever increasing workload. A picture archiving and communication system (PACS) is a platform for daily image reviewing that improves their work environment, and on that platform for example spinal MR images can be reviewed. When reviewing spinal images a radiologist wants vertebrae labels, and in Sectra's PACS platform there is a good opportunity for implementing an automatic method for spinal labeling. In this thesis a method for performing automatic spinal labeling, called a vertebrae classifier, is presented. This method should remove the need for radiologists to perform manual spine labeling, and could be implemented in Sectra's PACS software to improve radiologists overall work experience.Spine labeling is the process of marking vertebrae centres with a name on a spinal image. The method proposed in this thesis for performing that process was developed using a machine learning approach for vertebrae detection in sagittal MR images. The developed classifier works for both the lumbar and the cervical spine, but it is optimized for the lumbar spine. During the development three different methods for the purpose of vertebrae detection were evaluated. Detection is done on multiple sagittal slices. The output from the detection is then labeled using a pictorial structure based algorithm which uses a trained model of the spine to correctly assess correct labeling. The suggested method achieves 99.6% recall and 99.9% precision for the lumbar spine. The cervical spine achieves slightly worse performance, with 98.1% for both recall and precision. This result was achieved by training the proposed method on 43 images and validated with 89 images for the lumbar spine. The cervical spine was validated using 26 images. These results are promising, especially for the lumbar spine. However, further evaluation is needed to test the method in a clinical setting. / Radiologer får bara mindre och mindre tid för att utföra sina arbetsuppgifter, då arbetsbördan bara blir större. Ett picture archiving and communication system (PACS) är en platform där radiologer kan undersöka medicinska bilder, däribland magnetic resonance (MR) bilder av ryggraden. När radiologerna tittar på dessa bilder av ryggraden vill de att kotorna ska vara markerade med sina namn, och i Sectra's PACS platform finns det en bra möjlighet för att implementera en automatisk metod för att namnge ryggradens kotor på bilden. I detta examensarbete presenteras en metod för att automatiskt markera alla kotorna utifrån saggitala MR bilder. Denna metod kan göra så att radiologer inte längre behöver manuellt markera kotor, och den skulle kunna implementeras i Sectra's PACS för att förbättra radiologernas arbetsmiljö. Det som menas med att markera kotor är att man ger mitten av alla kotor ett namn utifrån en MR bild på ryggraden. Metoden som presenteras i detta arbete kan utföra detta med hjälp av ett "machine learning" arbetssätt. Metoden fungerar både för övre och nedre delen av ryggraden, men den är optimerad för den nedre delen. Under utvecklingsfasen var tre olika metoder för att detektera kotor evaluerade. Resultatet från detektionen är sedan använt för att namnge alla kotor med hjälp av en algoritm baserad på pictorial structures, som använder en tränad model för att kunna evaluera vad som bör anses vara korrekt namngivning. Metoden uppnår 99.6% recall och 99.9% precision för nedre ryggraden. För övre ryggraden uppnås något sämre resultat, med 98.1% vad gäller både recall och precision. Detta resultat uppnådes då metoden tränades på 43 bilder och validerades på 89 bilder för nedre ryggraden. För övre ryggraden användes 26 stycken bilder. Resultaten är lovande, speciellt för den nedre delen. Dock måste ytterligare utvärdering göras för metoden i en klinisk miljö.
4

Visual Representations and Models: From Latent SVM to Deep Learning

Azizpour, Hossein January 2016 (has links)
Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning. First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class.  In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection. Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence. Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with $\sim1.3$ million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition. / <p>QC 20160908</p>
5

Weakly supervised learning of deformable part models and convolutional neural networks for object detection / Détection d'objets faiblement supervisée par modèles de pièces déformables et réseaux de neurones convolutionnels

Tang, Yuxing 14 December 2016 (has links)
Dans cette thèse, nous nous intéressons au problème de la détection d’objets faiblement supervisée. Le but est de reconnaître et de localiser des objets dans les images, n’ayant à notre disposition durant la phase d’apprentissage que des images partiellement annotées au niveau des objets. Pour cela, nous avons proposé deux méthodes basées sur des modèles différents. Pour la première méthode, nous avons proposé une amélioration de l’approche ”Deformable Part-based Models” (DPM) faiblement supervisée, en insistant sur l’importance de la position et de la taille du filtre racine initial spécifique à la classe. Tout d’abord, un ensemble de candidats est calculé, ceux-ci représentant les positions possibles de l’objet pour le filtre racine initial, en se basant sur une mesure générique d’objectness (par region proposals) pour combiner les régions les plus saillantes et potentiellement de bonne qualité. Ensuite, nous avons proposé l’apprentissage du label des classes latentes de chaque candidat comme un problème de classification binaire, en entrainant des classifieurs spécifiques pour chaque catégorie afin de prédire si les candidats sont potentiellement des objets cible ou non. De plus, nous avons amélioré la détection en incorporant l’information contextuelle à partir des scores de classification de l’image. Enfin, nous avons élaboré une procédure de post-traitement permettant d’élargir et de contracter les régions fournies par le DPM afin de les adapter efficacement à la taille de l’objet, augmentant ainsi la précision finale de la détection. Pour la seconde approche, nous avons étudié dans quelle mesure l’information tirée des objets similaires d’un point de vue visuel et sémantique pouvait être utilisée pour transformer un classifieur d’images en détecteur d’objets d’une manière semi-supervisée sur un large ensemble de données, pour lequel seul un sous-ensemble des catégories d’objets est annoté avec des boîtes englobantes nécessaires pour l’apprentissage des détecteurs. Nous avons proposé de transformer des classifieurs d’images basés sur des réseaux convolutionnels profonds (Deep CNN) en détecteurs d’objets en modélisant les différences entre les deux en considérant des catégories disposant à la fois de l’annotation au niveau de l’image globale et l’annotation au niveau des boîtes englobantes. Cette information de différence est ensuite transférée aux catégories sans annotation au niveau des boîtes englobantes, permettant ainsi la conversion de classifieurs d’images en détecteurs d’objets. Nos approches ont été évaluées sur plusieurs jeux de données tels que PASCAL VOC, ImageNet ILSVRC et Microsoft COCO. Ces expérimentations ont démontré que nos approches permettent d’obtenir des résultats comparables à ceux de l’état de l’art et qu’une amélioration significative a pu être obtenue par rapport à des méthodes récentes de détection d’objets faiblement supervisées. / In this dissertation we address the problem of weakly supervised object detection, wherein the goal is to recognize and localize objects in weakly-labeled images where object-level annotations are incomplete during training. To this end, we propose two methods which learn two different models for the objects of interest. In our first method, we propose a model enhancing the weakly supervised Deformable Part-based Models (DPMs) by emphasizing the importance of location and size of the initial class-specific root filter. We first compute a candidate pool that represents the potential locations of the object as this root filter estimate, by exploring the generic objectness measurement (region proposals) to combine the most salient regions and “good” region proposals. We then propose learning of the latent class label of each candidate window as a binary classification problem, by training category-specific classifiers used to coarsely classify a candidate window into either a target object or a non-target class. Furthermore, we improve detection by incorporating the contextual information from image classification scores. Finally, we design a flexible enlarging-and-shrinking post-processing procedure to modify the DPMs outputs, which can effectively match the approximate object aspect ratios and further improve final accuracy. Second, we investigate how knowledge about object similarities from both visual and semantic domains can be transferred to adapt an image classifier to an object detector in a semi-supervised setting on a large-scale database, where a subset of object categories are annotated with bounding boxes. We propose to transform deep Convolutional Neural Networks (CNN)-based image-level classifiers into object detectors by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We have evaluated both our approaches extensively on several challenging detection benchmarks, e.g. , PASCAL VOC, ImageNet ILSVRC and Microsoft COCO. Both our approaches compare favorably to the state-of-the-art and show significant improvement over several other recent weakly supervised detection methods.
6

Mid-level representations for modeling objects / Représentations de niveau intermédiaire pour la modélisation d'objets

Tsogkas, Stavros 15 January 2016 (has links)
Dans cette thèse, nous proposons l'utilisation de représentations de niveau intermédiaire, et en particulier i) d'axes médians, ii) de parties d'objets, et iii) des caractéristiques convolutionnels, pour modéliser des objets.La première partie de la thèse traite de détecter les axes médians dans des images naturelles en couleur. Nous adoptons une approche d'apprentissage, en utilisant la couleur, la texture et les caractéristiques de regroupement spectral pour construire un classificateur qui produit une carte de probabilité dense pour la symétrie. Le Multiple Instance Learning (MIL) nous permet de traiter l'échelle et l'orientation comme des variables latentes pendant l'entraînement, tandis qu'une variante fondée sur les forêts aléatoires offre des gains significatifs en termes de temps de calcul.Dans la deuxième partie de la thèse, nous traitons de la modélisation des objets, utilisant des modèles de parties déformables (DPM). Nous développons une approche « coarse-to-fine » hiérarchique, qui utilise des bornes probabilistes pour diminuer le coût de calcul dans les modèles à grand nombre de composants basés sur HOGs. Ces bornes probabilistes, calculés de manière efficace, nous permettent d'écarter rapidement de grandes parties de l'image, et d'évaluer précisément les filtres convolutionnels seulement à des endroits prometteurs. Notre approche permet d'obtenir une accélération de 4-5 fois sur l'approche naïve, avec une perte minimale en performance.Nous employons aussi des réseaux de neurones convolutionnels (CNN) pour améliorer la détection d'objets. Nous utilisons une architecture CNN communément utilisée pour extraire les réponses de la dernière couche de convolution. Nous intégrons ces réponses dans l'architecture DPM classique, remplaçant les descripteurs HOG fabriqués à la main, et nous observons une augmentation significative de la performance de détection (~14.5% de mAP).Dans la dernière partie de la thèse nous expérimentons avec des réseaux de neurones entièrement convolutionnels pous la segmentation de parties d'objets.Nous réadaptons un CNN utilisé à l'état de l'art pour effectuer une segmentation sémantique fine de parties d'objets et nous utilisons un CRF entièrement connecté comme étape de post-traitement pour obtenir des bords fins.Nous introduirons aussi un à priori sur les formes à l'aide d'une Restricted Boltzmann Machine (RBM), à partir des segmentations de vérité terrain.Enfin, nous concevons une nouvelle architecture entièrement convolutionnel, et l'entraînons sur des données d'image à résonance magnétique du cerveau, afin de segmenter les différentes parties du cerveau humain.Notre approche permet d'atteindre des résultats à l'état de l'art sur les deux types de données. / In this thesis we propose the use of mid-level representations, and in particular i) medial axes, ii) object parts, and iii)convolutional features, for modelling objects.The first part of the thesis deals with detecting medial axes in natural RGB images. We adopt a learning approach, utilizing colour, texture and spectral clustering features, to build a classifier that produces a dense probability map for symmetry. Multiple Instance Learning (MIL) allows us to treat scale and orientation as latent variables during training, while a variation based on random forests offers significant gains in terms of running time.In the second part of the thesis we focus on object part modeling using both hand-crafted and learned feature representations. We develop a coarse-to-fine, hierarchical approach that uses probabilistic bounds for part scores to decrease the computational cost of mixture models with a large number of HOG-based templates. These efficiently computed probabilistic bounds allow us to quickly discard large parts of the image, and evaluate the exact convolution scores only at promising locations. Our approach achieves a $4times-5times$ speedup over the naive approach with minimal loss in performance.We also employ convolutional features to improve object detection. We use a popular CNN architecture to extract responses from an intermediate convolutional layer. We integrate these responses in the classic DPM pipeline, replacing hand-crafted HOG features, and observe a significant boost in detection performance (~14.5% increase in mAP).In the last part of the thesis we experiment with fully convolutional neural networks for the segmentation of object parts.We re-purpose a state-of-the-art CNN to perform fine-grained semantic segmentation of object parts and use a fully-connected CRF as a post-processing step to obtain sharp boundaries.We also inject prior shape information in our model through a Restricted Boltzmann Machine, trained on ground-truth segmentations.Finally, we train a new fully-convolutional architecture from a random initialization, to segment different parts of the human brain in magnetic resonance image data.Our methods achieve state-of-the-art results on both types of data.

Page generated in 0.0822 seconds