Global ETD Search

31	Vliv barevných modelů na chování konvolučních neuronových sítí / Impact of color models on performance of convolutional neural networks Šimunský, Martin January 2020 (has links) Current knowledge about impact of colour models on performance of convolutional neural network is investigated in the first part of this thesis. The experiment based on obtained knowledge is conducted in the second part. Six colour models HSV, CIE 1931 XYZ, CIE 1976 Lab*, YIQ a YCbCr and deep convolutional neural network ResNet-101 are used. RGB colour model achieved the highest classification accuracy, whereas HSV color model has the lowest accuracy in this experiment.
32	Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classification Frenger, Tobias, Häggmark, Johan January 2020 (has links) Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training. Convolutional neural networks Object classification Audio classification Transfer learning Inception-V3 Inception-ResNet-V2 Keras ImageNet Mini-Inception Mini-Inception-ResNet Mel-spectrogram CIFAR-10 Information Systems, Social aspects
33	2D/3D knowledge inference for intelligent access to enriched visual content / Modélisation et inférence 2D/3D de connaissances pour l'accès intelligent aux contenus visuels enrichis Sambra-Petre, Raluca-Diana 18 June 2013 (has links) Cette thèse porte sur la catégorisation d'objets vidéo. L'objectif est d'associer des étiquettes sémantiques à des objets 2D présents dans les images/vidéos. L'approche proposée consiste à exploiter des bases d'objets 3D classifiés afin d'identifier des objets 2D inconnus. Nous proposons un schéma de reconnaissance d'objet, conçu pour fonctionner pour des applications en temps réel. La similitude entre des modèles 3D et des contenus 2D inconnu est évaluée à l'aide de la description 2D/3D. Une procédure de vote est ensuite utilisée afin de déterminer les catégories les plus probables de l'objet 2D. Nous proposons aussi une stratégie pour la sélection des vues les plus représentatives d'un objet 3D et un nouveau descripteur de contour (nommé AH). L'évaluation expérimentale a montré que, en employant la sélection intelligente de vues, le nombre de projections peut être diminué de manière significative (jusqu'à 5 fois) tout en obtenant des performances similaires. Les résultats ont également montré la supériorité de l'AH par rapport aux autres descripteurs adoptés. Une évaluation objective de la variabilité intra et inter classe des bases de données 3D impliqués dans ce travail est également proposé, ainsi qu'une étude comparative des approches d'indexations retenues. Une approche de segmentation interactive est également introduite. La méthode proposée est spécifiquement conçu pour surmonter les artefacts de compression tels que ceux mis en place par la compression JPEG. Enfin, nous présentons une plate-forme Web pour l'indexation/la recherche/la classification, qui intègre les différentes méthodologies utilisées dans cette thèse / This Ph.D. thesis tackles the issue of sill and video object categorization. The objective is to associate semantic labels to 2D objects present in natural images/videos. The principle of the proposed approach consists of exploiting categorized 3D model repositories in order to identify unknown 2D objects based on 2D/3D matching techniques. We propose here an object recognition framework, designed to work for real time applications. The similarity between classified 3D models and unknown 2D content is evaluated with the help of the 2D/3D description. A voting procedure is further employed in order to determine the most probable categories of the 2D object. A representative viewing angle selection strategy and a new contour based descriptor (so-called AH), are proposed. The experimental evaluation proved that, by employing the intelligent selection of views, the number of projections can be decreased significantly (up to 5 times) while obtaining similar performance. The results have also shown the superiority of AH with respect to other state of the art descriptors. An objective evaluation of the intra and inter class variability of the 3D model repositories involved in this work is also proposed, together with a comparative study of the retained indexing approaches . An interactive, scribble-based segmentation approach is also introduced. The proposed method is specifically designed to overcome compression artefacts such as those introduced by JPEG compression. We finally present an indexing/retrieval/classification Web platform, so-called Diana, which integrates the various methodologies employed in this thesis Classification d'objets Segmentation d'objets Indexation 2D/3D Inférence 2D/3D Descripteur de forme Base de données de modèles 3D Object classification Object segmentation 2D/3D indexing 2D/3D inference Shape descriptor 3D model database
34	OBJECT DETECTION USING VISION TRANSFORMED EFFICIENTDET Shreyanil Kar (16285265) 30 August 2023 (has links) <p>This research presents a novel approach for object detection by integrating Vision Transformers (ViT) into the EfficientDet architecture. The field of computer vision, encompassing artificial intelligence, focuses on the interpretation and analysis of visual data. Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have significantly improved the accuracy and efficiency of computer vision systems. Object detection, a widely studied application within computer vision, involves the identification and localization of objects in images.</p> <p>The ViT backbone, renowned for its success in image classification and natural language processing tasks, employs self-attention mechanisms to capture global dependencies in input images. However, ViT’s capability to capture fine-grained details and context information is limited. To address this limitation, the integration of ViT into the EfficientDet architecture is proposed. EfficientDet is recognized for its efficiency and accuracy in object detection. By combining the strengths of ViT and EfficientDet, the proposed integration enhances the network’s ability to capture fine-grained details and context information. It leverages ViT’s global dependency modeling alongside EfficientDet’s efficient object detection framework, resulting in highly accurate and efficient performance. Noteworthy object detection frameworks utilized in the industry, such as RetinaNet, EfficientNet, and EfficientDet, primarily employ convolution.</p> <p>Experimental evaluations were conducted using the PASCAL VOC 2007 and 2012 datasets, widely acknowledged benchmarks for object detection. The integrated ViT-EfficientDet model achieved an impressive mean Average Precision (mAP) score of 86.27% when tested on the PASCAL VOC 2007 dataset, demonstrating its superior accuracy. These results underscore the potential of the proposed integration for real-world applications.</p> <p>In conclusion, the research introduces a novel integration of Vision Transformers into the EfficientDet architecture, yielding significant improvements in object detection performance. By combining ViT’s ability to capture global dependencies with EfficientDet’s efficiency and accuracy, the proposed approach offers enhanced object detection capabilities. Future research directions may explore additional datasets and evaluate the performance of the proposed framework across various computer vision tasks.</p> Computer vision Convolutional Neural Network PASCAL VOC EfficientDet Vision Transformer Object Detection Hybrid CNN Shreyanil Kar ViT-EfficientDet efficient model Artificial Intelligence Computer vision pytorch framework deep convolutional neural networks Machine learning Algorithm Object Classification Neural Networks
35	Object Detection in Domain Specific Stereo-Analysed Satellite Images Grahn, Fredrik, Nilsson, Kristian January 2019 (has links) Given satellite images with accompanying pixel classifications and elevation data, we propose different solutions to object detection. The first method uses hierarchical clustering for segmentation and then employs different methods of classification. One of these classification methods used domain knowledge to classify objects while the other used Support Vector Machines. Additionally, a combination of three Support Vector Machines were used in a hierarchical structure which out-performed the regular Support Vector Machine method in most of the evaluation metrics. The second approach is more conventional with different types of Convolutional Neural Networks. A segmentation network was used as well as a few detection networks and different fusions between these. The Convolutional Neural Network approach proved to be the better of the two in terms of precision and recall but the clustering approach was not far behind. This work was done using a relatively small amount of data which potentially could have impacted the results of the Machine Learning models in a negative way. object detection object classification clustering hierarchical clustering object localisation machine learning ai image localisation image segmentation semantic segmentation remote sensing images satellite images domain knowledge support vector machines svm convolutional neural network cnn fully convolutional network fcn you only look once yolo network fusion Computer Sciences Datavetenskap (datalogi)
36	Automatic segmentation and reconstruction of traffic accident scenarios from mobile laser scanning data Vock, Dominik 08 May 2014 (has links) (PDF) Virtual reconstruction of historic sites, planning of restorations and attachments of new building parts, as well as forest inventory are few examples of fields that benefit from the application of 3D surveying data. Originally using 2D photo based documentation and manual distance measurements, the 3D information obtained from multi camera and laser scanning systems realizes a noticeable improvement regarding the surveying times and the amount of generated 3D information. The 3D data allows a detailed post processing and better visualization of all relevant spatial information. Yet, for the extraction of the required information from the raw scan data and for the generation of useable visual output, time-consuming, complex user-based data processing is still required, using the commercially available 3D software tools. In this context, the automatic object recognition from 3D point cloud and depth data has been discussed in many different works. The developed tools and methods however, usually only focus on a certain kind of object or the detection of learned invariant surface shapes. Although the resulting methods are applicable for certain practices of data segmentation, they are not necessarily suitable for arbitrary tasks due to the varying requirements of the different fields of research. This thesis presents a more widespread solution for automatic scene reconstruction from 3D point clouds, targeting street scenarios, specifically for the task of traffic accident scene analysis and documentation. The data, obtained by sampling the scene using a mobile scanning system is evaluated, segmented, and finally used to generate detailed 3D information of the scanned environment. To realize this aim, this work adapts and validates various existing approaches on laser scan segmentation regarding the application on accident relevant scene information, including road surfaces and markings, vehicles, walls, trees and other salient objects. The approaches are therefore evaluated regarding their suitability and limitations for the given tasks, as well as for possibilities concerning the combined application together with other procedures. The obtained knowledge is used for the development of new algorithms and procedures to allow a satisfying segmentation and reconstruction of the scene, corresponding to the available sampling densities and precisions. Besides the segmentation of the point cloud data, this thesis presents different visualization and reconstruction methods to achieve a wider range of possible applications of the developed system for data export and utilization in different third party software tools. Laserscan 3D Rekonstruktion Unfallforschung Punktwolke mobile laser scanning MLS Objekterkennung Objektdetektion Punktwolkensegmentierung Objekt-klassifizierung laser scanning 3d reconstruction accident research point cloud mobile laser scanning MLS object recognition object detection point cloud segmentation point cloud processing object classification ddc:550 rvk:ZI 9510
37	Automatické třídění fotografií podle obsahu / Automatic Photography Categorization Gajová, Veronika January 2012 (has links) Purpose of this thesis is to design and implement a tool for automatic categorization of photos. The proposed tool is based on the Bag of Words classification method and it is realized as a plug-in for the XnView image viewer. The plug-in is able to classify a selected group of photos into predefined image categories. Subsequent notation of image categories is written directly into IPTC metadata of the picture as a keyword.
38	Automatic segmentation and reconstruction of traffic accident scenarios from mobile laser scanning data Vock, Dominik 18 December 2013 (has links) Virtual reconstruction of historic sites, planning of restorations and attachments of new building parts, as well as forest inventory are few examples of fields that benefit from the application of 3D surveying data. Originally using 2D photo based documentation and manual distance measurements, the 3D information obtained from multi camera and laser scanning systems realizes a noticeable improvement regarding the surveying times and the amount of generated 3D information. The 3D data allows a detailed post processing and better visualization of all relevant spatial information. Yet, for the extraction of the required information from the raw scan data and for the generation of useable visual output, time-consuming, complex user-based data processing is still required, using the commercially available 3D software tools. In this context, the automatic object recognition from 3D point cloud and depth data has been discussed in many different works. The developed tools and methods however, usually only focus on a certain kind of object or the detection of learned invariant surface shapes. Although the resulting methods are applicable for certain practices of data segmentation, they are not necessarily suitable for arbitrary tasks due to the varying requirements of the different fields of research. This thesis presents a more widespread solution for automatic scene reconstruction from 3D point clouds, targeting street scenarios, specifically for the task of traffic accident scene analysis and documentation. The data, obtained by sampling the scene using a mobile scanning system is evaluated, segmented, and finally used to generate detailed 3D information of the scanned environment. To realize this aim, this work adapts and validates various existing approaches on laser scan segmentation regarding the application on accident relevant scene information, including road surfaces and markings, vehicles, walls, trees and other salient objects. The approaches are therefore evaluated regarding their suitability and limitations for the given tasks, as well as for possibilities concerning the combined application together with other procedures. The obtained knowledge is used for the development of new algorithms and procedures to allow a satisfying segmentation and reconstruction of the scene, corresponding to the available sampling densities and precisions. Besides the segmentation of the point cloud data, this thesis presents different visualization and reconstruction methods to achieve a wider range of possible applications of the developed system for data export and utilization in different third party software tools. info:eu-repo/classification/ddc/550 ddc:550
39	Entwicklung und Validierung methodischer Konzepte einer kamerabasierten Durchfahrtshöhenerkennung für Nutzfahrzeuge Hänert, Stephan 03 July 2020 (has links) Die vorliegende Arbeit beschäftigt sich mit der Konzeptionierung und Entwicklung eines neuartigen Fahrerassistenzsystems für Nutzfahrzeuge, welches die lichte Höhe von vor dem Fahrzeug befindlichen Hindernissen berechnet und über einen Abgleich mit der einstellbaren Fahrzeughöhe die Passierbarkeit bestimmt. Dabei werden die von einer Monokamera aufgenommenen Bildsequenzen genutzt, um durch indirekte und direkte Rekonstruktionsverfahren ein 3D-Abbild der Fahrumgebung zu erschaffen. Unter Hinzunahme einer Radodometrie-basierten Eigenbewegungsschätzung wird die erstellte 3D-Repräsentation skaliert und eine Prädiktion der longitudinalen und lateralen Fahrzeugbewegung ermittelt. Basierend auf dem vertikalen Höhenplan der Straßenoberfläche, welcher über die Aneinanderreihung mehrerer Ebenen modelliert wird, erfolgt die Klassifizierung des 3D-Raums in Fahruntergrund, Struktur und potentielle Hindernisse. Die innerhalb des Fahrschlauchs liegenden Hindernisse werden hinsichtlich ihrer Entfernung und Höhe bewertet. Ein daraus abgeleitetes Warnkonzept dient der optisch-akustischen Signalisierung des Hindernisses im Kombiinstrument des Fahrzeugs. Erfolgt keine entsprechende Reaktion durch den Fahrer, so wird bei kritischen Hindernishöhen eine Notbremsung durchgeführt. Die geschätzte Eigenbewegung und berechneten Hindernisparameter werden mithilfe von Referenzsensorik bewertet. Dabei kommt eine dGPS-gestützte Inertialplattform sowie ein terrestrischer und mobiler Laserscanner zum Einsatz. Im Rahmen der Arbeit werden verschiedene Umgebungssituationen und Hindernistypen im urbanen und ländlichen Raum untersucht und Aussagen zur Genauigkeit und Zuverlässigkeit des Verfahrens getroffen. Ein wesentlicher Einflussfaktor auf die Dichte und Genauigkeit der 3D-Rekonstruktion ist eine gleichmäßige Umgebungsbeleuchtung innerhalb der Bildsequenzaufnahme. Es wird in diesem Zusammenhang zwingend auf den Einsatz einer Automotive-tauglichen Kamera verwiesen. Die durch die Radodometrie bestimmte Eigenbewegung eignet sich im langsamen Geschwindigkeitsbereich zur Skalierung des 3D-Punktraums. Dieser wiederum sollte durch eine Kombination aus indirektem und direktem Punktrekonstruktionsverfahren erstellt werden. Der indirekte Anteil stützt dabei die Initialisierung des Verfahrens zum Start der Funktion und ermöglicht eine robuste Kameraschätzung. Das direkte Verfahren ermöglicht die Rekonstruktion einer hohen Anzahl an 3D-Punkten auf den Hindernisumrissen, welche zumeist die Unterkante beinhalten. Die Unterkante kann in einer Entfernung bis zu 20 m detektiert und verfolgt werden. Der größte Einflussfaktor auf die Genauigkeit der Berechnung der lichten Höhe von Hindernissen ist die Modellierung des Fahruntergrunds. Zur Reduktion von Ausreißern in der Höhenberechnung eignet sich die Stabilisierung des Verfahrens durch die Nutzung von zeitlich vorher zur Verfügung stehenden Berechnungen. Als weitere Maßnahme zur Stabilisierung wird zudem empfohlen die Hindernisausgabe an den Fahrer und den automatischen Notbremsassistenten mittels einer Hysterese zu stützen. Das hier vorgestellte System eignet sich für Park- und Rangiervorgänge und ist als kostengünstiges Fahrerassistenzsystem interessant für Pkw mit Aufbauten und leichte Nutzfahrzeuge. / The present work deals with the conception and development of a novel advanced driver assistance system for commercial vehicles, which estimates the clearance height of obstacles in front of the vehicle and determines the passability by comparison with the adjustable vehicle height. The image sequences captured by a mono camera are used to create a 3D representation of the driving environment using indirect and direct reconstruction methods. The 3D representation is scaled and a prediction of the longitudinal and lateral movement of the vehicle is determined with the aid of a wheel odometry-based estimation of the vehicle's own movement. Based on the vertical elevation plan of the road surface, which is modelled by attaching several surfaces together, the 3D space is classified into driving surface, structure and potential obstacles. The obstacles within the predicted driving tube are evaluated with regard to their distance and height. A warning concept derived from this serves to visually and acoustically signal the obstacle in the vehicle's instrument cluster. If the driver does not respond accordingly, emergency braking will be applied at critical obstacle heights. The estimated vehicle movement and calculated obstacle parameters are evaluated with the aid of reference sensors. A dGPS-supported inertial measurement unit and a terrestrial as well as a mobile laser scanner are used. Within the scope of the work, different environmental situations and obstacle types in urban and rural areas are investigated and statements on the accuracy and reliability of the implemented function are made. A major factor influencing the density and accuracy of 3D reconstruction is uniform ambient lighting within the image sequence. In this context, the use of an automotive camera is mandatory. The inherent motion determined by wheel odometry is suitable for scaling the 3D point space in the slow speed range. The 3D representation however, should be created by a combination of indirect and direct point reconstruction methods. The indirect part supports the initialization phase of the function and enables a robust camera estimation. The direct method enables the reconstruction of a large number of 3D points on the obstacle outlines, which usually contain the lower edge. The lower edge can be detected and tracked up to 20 m away. The biggest factor influencing the accuracy of the calculation of the clearance height of obstacles is the modelling of the driving surface. To reduce outliers in the height calculation, the method can be stabilized by using calculations from older time steps. As a further stabilization measure, it is also recommended to support the obstacle output to the driver and the automatic emergency brake assistant by means of hysteresis. The system presented here is suitable for parking and maneuvering operations and is interesting as a cost-effective driver assistance system for cars with superstructures and light commercial vehicles. info:eu-repo/classification/ddc/380 ddc:380
40	Advancing profiling sensors with a wireless approach Galvis, Alejandro 20 November 2013 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / In general, profiling sensors are low-cost crude imagers that typically utilize a sparse detector array, whereas traditional cameras employ a dense focal-plane array. Profiling sensors are of particular interest in applications that require classification of a sensed object into broad categories, such as human, animal, or vehicle. However, profiling sensors have many other applications in which reliable classification of a crude silhouette or profile produced by the sensor is of value. The notion of a profiling sensor was first realized by a Near-Infrared (N-IR), retro-reflective prototype consisting of a vertical column of sparse detectors. Alternative arrangements of detectors have been implemented in which a subset of the detectors have been offset from the vertical column and placed at arbitrary locations along the anticipated path of the objects of interest. All prior work with the N-IR, retro-reflective profiling sensors has consisted of wired detectors. This thesis surveys prior work and advances this work with a wireless profiling sensor prototype in which each detector is a wireless sensor node and the aggregation of these nodes comprises a profiling sensor’s field of view. In this novel approach, a base station pre-processes the data collected from the sensor nodes, including data realignment, prior to its classification through a back-propagation neural network. Such a wireless detector configuration advances deployment options for N-IR, retro-reflective profiling sensors. Wireless sensor networks -- Research Signal processing Information display systems Array processors -- Analysis Optical data processing

Search results