1 |
Studies on support vector machines and applications to video object extractionLiu, Yi 22 September 2006 (has links)
No description available.
|
2 |
Segmentation spatio-temporelle et indexation vidéo dans le domaine des représentations hiérarchiquesMorand, Claire 25 November 2009 (has links)
L'objectif de cette thèse est de proposer une solution d'indexation ``scalable'' et basée objet de flux vidéos HD compressés avec Motion JPEG2000. Dans ce contexte, d'une part, nous travaillons dans le domaine transformé hiérachique des ondelettes 9/7 de Daubechies et, d'autre part, la représentation ``scalable'' nécessite des méthodes en multirésolution, de basse résolution vers haute résolution. La première partie de ce manuscrit est dédiée à la définition d'une méthode d'extraction automatique des objets en mouvement. Elle repose sur la combinaison d'une estimation du mouvement global robuste et d'une segmentation morphologique couleur à basse résolution. Le résultat est ensuite affiné en suivant l'ordre des données dans le flux scalable. La deuxième partie est consacrée à la définition d'un descripteur sur les objets précédemment extraits, basé sur les histogrammes en multirésolution des coefficients d'ondelettes. Enfin, les performances de la méthode d'indexation proposée sont évaluées dans le contexte de requêtes scalables de recherche de vidéos par le contenu. / This thesis aims at proposing a solution of scalable object-based indexing of HD video flow compressed by MJPEG2000. In this context, on the one hand, we work in the hierarchical transform domain of the 9/7 Daubechies' wavelets and, on the other hand, the scalable representation implies to search for multiscale methods, from low to high resolution. The first part of this manuscript is dedicated to the definition of a method for automatic extraction of objects having their own motion. It is based on a combination of a robust global motion estimation with a morphological color segmentation at low resolution. The obtained result is then refined following the data order of the scalable flow. The second part is the definition of an object descriptor which is based on the multiscale histograms of the wavelet coefficients. Finally, the performances of the proposed method are evaluated in the context of scalable content-based queries.
|
3 |
Object Extraction for Virtual-viewpoint Video Synthesis / 仮想視点映像の合成を目的としたオブジェクト抽出Sankoh, Hiroshi 25 May 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19202号 / 情博第586号 / 新制||情||102(附属図書館) / 32194 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 美濃 導彦, 教授 松山 隆司, 教授 田中 克己 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
4 |
Object Extraction From Images/videos Using A Genetic Algorithm Based ApproachYilmaz, Turgay 01 January 2008 (has links) (PDF)
The increase in the use of digital video/image has showed the need for modeling and querying the semantic content in them. Using manual annotation techniques for defining the semantic content is both costly in time and have limitations on querying capabilities. So, the need for content based information retrieval in multimedia domain is to extract the semantic content in an automatic way. The semantic content is usually defined with the objects in images/videos. In this thesis, a Genetic Algorithm based object extraction and classification mechanism is proposed for extracting the content of the videos and images. The object extraction is defined as a classification problem and a Genetic Algorithm based classifier is proposed for classification. Candidate objects are extracted from videos/images by using Normalized-cut segmentation and sent to the classifier for classification. Objects are defined with the Best Representative and Discriminative Feature (BRDF) model, where features are MPEG-7 descriptors. The decisions of the classifier are calculated by using these features and BRDF model. The classifier improves itself in time, with the genetic operations of GA. In addition to these, the system supports fuzziness by making multiple categorization and giving fuzzy decisions on the objects. Externally from the base model, a statistical feature importance determination method is proposed to generate BRDF model of the categories automatically. In the thesis, a platform independent application for the proposed system is also implemented.
|
5 |
Poloautomatická segmentace obrazu / Semi-Automatic Image SegmentationHorák, Jan January 2015 (has links)
This work describes design and implementation of a tool for creating photomontages. The tool is based on methods of semi-automatic image segmentation. Work outlines problems of segmentation of image data and benefits of interaction with the user. It analyzes different approaches to interactive image segmentation, explains their principles and shows their positive and negative aspects. It also presents advantages and disadvantages of currently used photo-editing applications. Proposes application for creating photomontages which consists of two steps: Extraction of an object from picture and insertion of it into another picture. The first step uses the method of semi-automatic segmentation GrabCut based on the graph theory. The work also includes comparison between application and other applications in which it is possible to create a photomontage, and application tests done by users.
|
6 |
REALTIME MAPPING AND SCENE RECONSTRUCTION BASED ON MID-LEVEL GEOMETRIC FEATURESGeorgiev, Kristiyan January 2014 (has links)
Robot mapping is a major field of research in robotics. Its basic task is to combine (register) spatial data, usually gained from range devices, to a single data set. This data set is called global map and represents the environment, observed from different locations, usually without knowledge of their positions. Various approaches can be classified into groups based on the type of sensor, e.g. Lasers, Microsoft Kinect, Stereo Image Pair. A major disadvantage of current methods is the fact, that they are derived from hardly scalable 2D approaches that use a small amount of data. However, 3D sensing yields a large amount of data in each 3D scan. Autonomous mobile robots have limited computational power, which makes it harder to run 3D robot mapping algorithms in real-time. To remedy this limitation, the proposed research uses mid-level geometric features (lines and ellipses) to construct 3D geometric primitives (planar patches, cylinders, spheres and cones) from 3D point data. Such 3D primitives can serve as distinct features for faster registration, allowing real-time performance on a mobile robot. This approach works in real-time, e.g. using a Microsoft Kinect to detect planes with 30 frames per second. While previous approaches show insufficient performance, the proposed method operates in real-time. In its core, the algorithm performs a fast model fitting with a model update in constant time (O(1)) for each new data point added to the model using a three stage approach. The first step inspects 1.5D sub spaces, to find lines and ellipses. The next stage uses these lines and ellipses as input by examining their neighborhood structure to form sets of candidates for the 3D geometric primitives. Finally, candidates are fitted to the geometric primitives. The complexity for point processing is O(n); additional time of lower order is needed for working on significantly smaller amount of mid-level objects. The real-time performance suggests this approach as a pre-processing step for 3D real-time higher level tasks in robotics, like tracking or feature based mapping. In this thesis, I will show how these features are derived and used for scene registration. Optimal registration is determined by finding plane-feature correspondence based on mutual similarity and geometric constraints. Our approach determines the plane correspondence in three steps. First step computes the distance between all pairs of planes from the first scan to all pair of planes from the second scan. The distance function captures angular, distance and co-planarity differences. The resulting distances are accumulated in a distance matrix. The next step uses the distance matrix to compute the correlation matrix between planes from the first and second scan. Finally plane correspondence is found by finding the global optimal assignment from the correlation matrix. After finding the plane correspondence, an optimal pose registration is computed. In addition to that, I will provide a comparison to existing state-of-the-art algorithms. This work is part of an industry collaboration effort sponsored by the National Institute of Standards and Technology (NIST), aiming at performance evaluation and modeling of autonomous navigation in unstructured and dynamic environments. Additional field work, in the form of evaluation of real robotic systems in a robot test arena was performed. / Computer and Information Science / Accompanied by two .mp4 files.
|
7 |
Decoding Visual and Textual Elements in CSR Reports : A Systematic Analysis of Images and Text for Corporate Sustainability InsightsWeerasinghe, Julian, Batawala, Nilupa January 2024 (has links)
This thesis examines the interplay of visual and textual discourse in Corporate Socia lResponsibility (CSR) reports, offering a systematic framework to analyse a dataset comprising around 66,925 images from 675 CSR reports. By analysing image attributes, colours, and objects in conjunction with textual sentiment and topics, we investigate the similarities, contrast and trends across various sectors and regions, and the impact of company characteristics. The mixed-methods approach, incorporating both qualitative image analysis and quantitative text evaluation, reveals patterns in how CSR initiatives are visually and textually communicated. Image and text extraction were accomplished using PyMuPDF and Tesseract libraries, harnessing the OCR capabilities. The identification of living objects was performed using OpenCV, while image classification was executed with the OpenAI-CLIP model, yielding high accuracy in extracting the visual content of the images. The developed framework achieved accuracy rate of 81% on living object identification using OpenCV model and 76% accuracy in object classification using OpenAI-CLIP model. The study's results indicate that the distinct patterns in how CSR is depicted, varying by sector, geographic location, and company size. These patterns offer key insights for developing more targeted and effective strategies for engaging with stakeholders.
|
8 |
Klasifikace dat leteckého laserového skenování v pískovcových skalních městech / Classification of Airborne Laser Scanning Data in Sandstone LandscapesTomková, Michaela January 2018 (has links)
Classification of Airborne Laser Scanning Data in Sandstone Landscapes Abstract This work is concerned with the classification of airborne laser scanning data in sandstone landscapes called "rock cities". Standard filters do not work reliably in such a rugged terrain covered with dense vegetation and in the results the rock formations are smoothed or even removed from the terrain. The method of classification suggested in this work is based on the procedure used in manual filtration. When exploring a sufficiently dense point cloud in 3D, one is able to distinguish rock formations from trees even though their shapes are similar. In contrast to trees, rock pillars are modeled only by points reflected off the surface and therefore they make concave elevations in the ground. Because of penetration of trees, there are points reflected off a treetop, branches, leaves and also ground under the tree. The proposed method segments a point cloud according to local minima in approximated surface and classifies these objects into classes rock, tree, and mix by inner point distribution. Objects in classes tree and mix are then filtered by lasground function from LAStools. The method was tested with merged point cloud consisted of data from the standard airborne laser scanning of the Czech Republic and experimental...
|
9 |
Image-based detection and classification of allergenic pollen / Détection et classification des pollens allergisants basée sur l'imageLozano Vega, Gildardo 18 June 2015 (has links)
Le traitement médical des allergies nécessite la caractérisation des pollens en suspension dans l’air. Toutefois, cette tâche requiert des temps d’analyse très longs lorsqu’elle est réalisée de manière manuelle. Une approche automatique améliorerait ainsi considérablement les applications potentielles du comptage de pollens. Les dernières techniques d’analyse d’images permettent la détection de caractéristiques discriminantes. C’est pourquoi nous proposons dans cette thèse un ensemble de caractéristiques pertinentes issues d’images pour la reconnaissance des principales classes de pollen allergènes. Le cœur de notre étude est l’évaluation de groupes de caractéristiques capables de décrire correctement les pollens en termes de forme, texture, taille et ouverture. Les caractéristiques sont extraites d’images acquises classiquement sous microscope, permettant la reproductibilité de la méthode. Une étape de sélection des caractéristiques est appliquée à chaque groupe pour évaluer sa pertinence.Concernant les apertures présentes sur certains pollens, une méthode adaptative de détection, localisation et comptage pour différentes classes de pollens avec des apparences variées est proposée. La description des apertures se base sur une stratégie de type Sac-de-Mots appliquée à des primitives issues des images. Une carte de confiance est construite à partir de la confiance donnée à la classification des régions de l’image échantillonnée. De cette carte sont extraites des caractéristiques propres aux apertures, permettant leur comptage. La méthode est conçue pour être étendue de façon modulable à de nouveaux types d’apertures en utilisant le même algorithme mais avec un classifieur spécifique.Les groupes de caractéristiques ont été testés individuellement et conjointement sur les classes de pollens les plus répandues en Allemagne. Nous avons montré leur efficacité lors d’une classification de type SVM, notamment en surpassant la variance intra-classe et la similarité inter-classe. Les résultats obtenus en utilisant conjointement tous les groupes de caractéristiques ont abouti à une précision de 98,2 %, comparable à l’état de l’art. / The correct classification of airborne pollen is relevant for medical treatment of allergies, and the regular manual process is costly and time consuming. An automatic processing would increase considerably the potential of pollen counting. Modern computer vision techniques enable the detection of discriminant pollen characteristics. In this thesis, a set of relevant image-based features for the recognition of top allergenic pollen taxa is proposed and analyzed. The foundation of our proposal is the evaluation of groups of features that can properly describe pollen in terms of shape, texture, size and apertures. The features are extracted on typical brightfield microscope images that enable the easy reproducibility of the method. A process of feature selection is applied to each group for the determination of relevance.Regarding apertures, a flexible method for detection, localization and counting of apertures of different pollen taxa with varying appearances is proposed. Aperture description is based on primitive images following the Bag-of-Words strategy. A confidence map is built from the classification confidence of sampled regions. From this map, aperture features are extracted, which include the count of apertures. The method is designed to be extended modularly to new aperture types employing the same algorithm to build individual classifiers.The feature groups are tested individually and jointly on of the most allergenic pollen taxa in Germany. They demonstrated to overcome the intra-class variance and inter-class similarity in a SVM classification scheme. The global joint test led to accuracy of 98.2%, comparable to the state-of-the-art procedures.
|
10 |
Automatisierte Objekterkennung zur Interpretation hochauflösender Bilddaten in der ErdfernerkundungMayer, Stefan 09 June 2004 (has links)
Als Datengrundlage für die Erhebung von Flächennutzungsparametern, wie sie in geografischen Informationssystemen (GIS) abgelegt und verwaltet werden, dienen oft Bilddaten aus der Erdfernerkundung. Die zur Erkennung und Unterscheidung der Objekte notwendige hohe Pixelauflösung führt bei der Erfassung eines Zielgebiets wie beispielsweise einer Stadt zu enormen Datenmengen. Aus diesen Bilddaten gilt es, möglichst schnell und preiswert die für ein GIS notwendigen Informationen, wie Umrissvektoren und Objektattribute, zu extrahieren. Diese Arbeit ist ein Beitrag zur Automatisierung dieses Schritts mit besonderem Schwerpunkt auf der Gebäudeextraktion. Datengrundlage sind hochauflösende multispektrale Orthobilder und ein digitales Oberflächenmodell (DOM) der digitalen Luftbildkamera HRSC-A bzw. HRSC-AX zum Einsatz. Deswegen werden das Aufnahmeprinzip sowie die Datenverarbeitung der HRSC überblicksartig vorgestellt. Auf Basis dieser HRSC-Standarddatenprodukte wird ein Vorgehen zur Extraktion von Objekten entwickelt. In einer hierarchisch geordneten Abfolge an Segmentierungsschritten werden aus der Pixelinformation bedeutungstragende Einheiten extrahiert. Dieser Segmentierungsansatz lässt sich auf mehrere Objektkategorien, wie Straßen oder Ackerflächen, erweitern. So werden in der aktuellen Entwicklungsstufe neben Gebäuden auch Baumregionen detektiert. Anhand des Oberflächenmodells werden erhöhte Regionen erkannt. Dazu wird das DOM durch Berechnung eines Terrainmodells auf Grundhöhe normiert. Für erhöhte Objekte wird die Grundhöhe aus umliegenden Grundregionen abgeleitet. Die erhöhten Regionen werden anschließend in Bäume und Gebäude unterteilt. Dazu werden aus den Multispektraldaten Vegetationscharakteristika bestimmt und entsprechende Baumsegmente ermittelt. Die Gebäuderegionen resultieren aus einer Nachverarbeitung der verbleibenden Segmente. Um Gebäudekomplexe in einzelne Häuser aufzuteilen, wird ein gradientenbasierter Ansatz entwickelt. Anhand der für Brandmauern typischen Gradienteninformation werden Linienhypothesen zur Unterteilung der Gebäudesegmente generiert. Diese werden schrittweise anhand geometrischer und radiometrischer Kriterien auf ihre Plausibilität überprüft. Schließlich werden die ursprünglich aus dem DOM stammenden Konturen der Gebäudesegmente und deren Übereinstimung mit Bildkanten eines Orthobildes betrachtet. In einem adaptiven Ansatz wird das Konturpolygon durch die Gradienteninformation an angrenzende Bildkanten angepasst. Zur Umsetzung typischer Gebäudegeometrien wie rechter Winkel oder Parallelität werden innerhalb des Adaptionsprozesses entsprechende Nebenbedingungen formuliert. Die Extraktion erhöhter Objekte wie auch deren Unterteilung in Bäume und Gebäude erfolgt mit hoher Genauigkeit, z.B. liegen die Detektionsraten bei Gebäuden über 90%. Der neuartige Ansatz zur Unterteilung in einzelne Häuser ohne explizite Liniendetektion führt bereits in der vorgestellten Entwicklungsstufe zur Beschleunigung einer manuellen Interpretation. Die adaptive Verbesserung der Gebäudekontur führt zu gebäudetypischeren Umrissen ohne Beeinträchtigung der hohen Detektionsraten. / Remote sensing image data are often used as a basis for determining land use parameters, as they are stored and managed in geographic information systems (GIS). Covering a target area leads to an enormous amount of data due to the high pixel resolution required for recognizing and discriminating objects. To effectively derive GIS information like contour vectors or object attributes from these data, the extraction process has to be fast and cost-effective. This thesis is a contribution to the automization of this step with a focus on building extraction. High resolution multispectral ortho-images and a digital surface model (DSM), generated by the digital aerial camera HRSC-A or HRSC-AX, are used as data basis. Therefore, the HRSC imaging principle and data processing are summarized. Based on these HRSC standard data products, an object extraction scheme is developed. In a hierarchically ordered sequence of segmentation steps, meaningful units are extracted from pixel information. This segmentation approach is extendable to several object categories like streets or fields. Thus, tree regions, as well as buildings are detected in the current stage of implementation. Elevated regions are recognized using the digital surface model. For that purpose the DSM is normalized by calculating a terrain model. For elevated objects the terrain height is derived from surrounding ground regions. Subsequently, the elevated regions are separated into trees and buildings. Determining spectral characteristics of vegetation from the multispectral data leads to corresponding tree segments. The building regions result from post-processing the remaining segments. In order to split the building segments into single houses, a gradient based approach is developed. By means of the gradient information associated with firewalls, line hypotheses for subdividing the building segments are generated. Their plausibility is checked by gradually applying geometric and spectral criteria. Finally, the building contours, originally derived from the DSM, and their correspondence to image edges in an ortho-image, are considered. In an adaptive approach, the contour polygon is adjusted to neighboring image edges using the gradient information. Typical building geometries like right angles or parallelism are enforced by applying corresponding constraints in the adaption process. The extraction of elevated objects, as well as the separation into trees and buildings, is carried out with high accuracy, e.g. the building detection rates are over 90%. In the current development stage the novel approach for separating building segments into single houses without an explicit line detection already leads to a speeding-up of a manual interpretation. The adaptive improvement of building contours leads to building typical contours without affecting the high detection rates.
|
Page generated in 0.0687 seconds