Global ETD Search

21	3D rekonstrukce z více pohledů kamer / 3D reconstruction from multiple views Sládeček, Martin January 2019 (has links) This thesis deals with the task of three-dimensional scene reconstruction using image data obtained from multiple views. It is assumed that intrinsic parameters of the utilized cameras are known. The theoretical chapters describe the basic priciples of individual reconstruction steps. Variuous possible implementaions of data model suitable for this task are also described. The practical part also includes a comparison of false keypoint correspondence filtering, implementation of polar stereo rectification and comparison of disparity map calculation methods that are bundled with the OpenCV library. In the final portion of the thesis, examples of reconstructed 3D models are presented and discussed.
22	Natural scene classification, annotation and retrieval. Developing different approaches for semantic scene modelling based on Bag of Visual Words. Alqasrawi, Yousef T. N. January 2012 (has links) With the availability of inexpensive hardware and software, digital imaging has become an important medium of communication in our daily lives. A huge amount of digital images are being collected and become available through the internet and stored in various fields such as personal image collections, medical imaging, digital arts etc. Therefore, it is important to make sure that images are stored, searched and accessed in an efficient manner. The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Based on this promising model, this thesis investigates three main problems: natural scene classification, annotation and retrieval. Given an image, the task is to design a system that can determine to which class that image belongs to (classification), what semantic concepts it contain (annotation) and what images are most similar to (retrieval). This thesis contributes to scene classification by proposing a weighting approach, named keypoints density-based weighting method (KDW), to control the fusion of colour information and bag of visual words on spatial pyramid layout in a unified framework. Different configurations of BOW, integrated visual vocabularies and multiple image descriptors are investigated and analyzed. The proposed approaches are extensively evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories using 10-fold cross validation. The second contribution in this thesis, the scene annotation task, is to explore whether the integrated visual vocabularies generated for scene classification can be used to model the local semantic information of natural scenes. In this direction, image annotation is considered as a classification problem where images are partitioned into 10x10 fixed grid and each block, represented by BOW and different image descriptors, is classified into one of predefined semantic classes. An image is then represented by counting the percentage of every semantic concept detected in the image. Experimental results on 6 scene categories demonstrate the effectiveness of the proposed approach. Finally, this thesis further explores, with an extensive experimental work, the use of different configurations of the BOW for natural scene retrieval. / Applied Science University in Jordan Image classification Image retrieval Bag of visual words Visual vocabulary Features fusion Spatial pyramid layout Concept-based bag of visual words Digital images Natural scene classification
23	Automatic Detection of Structural Deformations in Batteries from Imaging data using Machine Learning : Exploring the potential of different approaches for efficient structural deformation detection / Automatisk detektering av strukturella deformationer i batterier från bilddata med maskininlärning Khan, Maira January 2023 (has links) The increasing occurrence of structural deformations in the electrodes of the jelly roll has raised quality concerns during battery manufacturing, emphasizing the need to detect them automatically with the advanced techniques. This thesis aims to explore and provide two models based on traditional computer vision (CV) and deep neural network (DNN) techniques using computed tomography (CT) scan images of jelly rolls to ensure that the product is of high quality. For both approaches, electrode peaks as keypoints of anodes and cathodes in prismatic lithium battery jelly rolls are detected to extract the geometric features to identify if a particular jelly roll has some structural deformations. For traditional CV methods, the images undergo some pre-processing steps, extraction of foreground through adaptive thresholding, and morphological operations to extract contour edges, followed by applying Harris corner detector to detect electrode peaks. However, this approach shows limitations in detecting small or negative distance differences in deformed images. Furthermore, this study proposes another approach based on supervised transfer learning using pre-trained deep learning models on annotated data. After exploring different architectures, the VGG19 model pre-trained on ImageNet dataset outperformed as compared to other architectures, even with insufficient training data, achieving a maximum accuracy of 93.13 % for 1-pixel distance, 98.87 % for 5-pixel distance and 99.29 % for 10-pixel distance on test data, where the performance metrics, such as Percentage of Correct Keypoint (PCK), Mean-Square Error and Huber loss are utilized. As a result, this baseline proves to be a valuable tool for detecting structural deformations in jelly rolls. Moreover, a GUI-based executable application is developed using both approaches for raising the OK or NG flags for detecting structural deformations in each jelly roll. / Den ökande förekomsten av strukturella deformationer av elektroderna i så kallade jelly rolls har väckt kvalitetsproblem under batteritillverkning, och betonat behovet av att upptäcka dem automatiskt med avancerade tekniker. Denna avhandling syftar till att utforska och tillhandahålla två modeller baserade på traditionell datorseende (CV) och djupa neurala nätverk (DNN) tekniker med hjälp av bilder från datortomografisk skanning (CT) av jelly rolls för att säkerställa att produkten är av hög kvalitet. För båda metoderna detekteras elektrodtoppar som nyckelpunkter på anoder och katoder i prismatiska litiumbatteriers jelly rolls för att extrahera de geometriska egenskaperna för att identifiera om en viss jelly roll har några strukturella deformationer. För traditionella CV-metoder genomgår bilderna några förbehandlingssteg, extraktion av förgrund genom adaptiv tröskling och morfologiska operationer för att extrahera konturkanter, följt av användning av Harris hörndetektor för att upptäcka elektrodtoppar. Denna metod visar dock begränsningar i att detektera små eller negativa avståndsskillnader i deformerade bilder. Vidare föreslår denna studie en annan metod baserad på övervakad överföringsinlärning med förtränade djupinlärningsmodeller på annoterade data. Efter att ha utforskat olika arkitekturer presterade VGG19-modellen förtränad på ImageNet-datasetet bättre jämfört med andra arkitekturer, även med otillräcklig träningsdata, och uppnådde en maximal noggrannhet på 91,56% för 1-pixels avstånd, 97,49% för 5-pixels avstånd och 98,91% för 10-pixels avstånd på testdata, där prestationsmått som procentandel av korrekta nyckelpunkter (PCK), medelkvadratfel och Huber-förlust används. Som ett resultat visar sig denna grundlinje vara ett värdefullt verktyg för att upptäcka strukturella deformationer i jelly rolls. Dessutom har exekverbar applikation med grafiskt gränssnitt utvecklats med båda metoderna för att höja OK/NG-flaggorna för att upptäcka strukturella deformationer i varje jelly roll. CT scan electrode peaks jelly roll keypoints structural deformation traditional computer vision deep neural network CT-skanning elektrodtoppar gelérulle nyckelpunkter strukturell deformation Traditionellt datorseende djupt neuralt nätverk Computer and Information Sciences Data- och informationsvetenskap
24	Voronoi tessellation quality: applications in digital image analysis A-iyeh, Enoch January 1900 (has links) A measure of the quality of Voronoi tessellations resulting from various mesh generators founded on feature-driven models is introduced in this work. A planar tessellation covers an image with polygons of various shapes and sizes. Tessellations have potential utility due to their geometry and the opportunity to derive useful information from them for object recognition, image processing and classification. Problem domains including images are generally feature-endowed, non-random domains. Generators modeled otherwise may easily guarantee quality of meshes but certainly bear no reference to features of the meshed problem domain. They are therefore unsuitable in point pattern identification, characterization and subsequently the study of meshed regions. We therefore found generators on features of the problem domain. This provides a basis for element quality studies and improvement based on quality criteria. The resulting polygonal meshes tessellating an n-dimensional digital image into convex regions are of varying element qualities. Given several types of mesh generating sets, a measure of overall solution quality is introduced to determine their effectiveness. Given a tessellation of general and mixed shapes, this presents a challenge in quality improvement. The Centroidal Voronoi Tessellation (CVT) technique is developed for quality improvement and guarantees of mixed, general-shaped elements and to preserve the validity of the tessellations. Mesh quality indicators and entropies introduced are useful for pattern studies, analysis, recognition and assessing information. Computed features of tessellated spaces are explored for image information content assessment and cell processing to expose detail using information theoretic methods. Tessellated spaces also furnish information on pattern structure and organization through their quality distributions. Mathematical and theoretical results obtained from these spaces help in understanding Voronoi diagrams as well as for their successful applications. Voronoi diagrams expose neighbourhood relations between pattern units. Given this realization, the foundation of near sets is developed for further applications. / February 2017
25	3D Rekonstrukce historických míst z obrázků na Flickru / 3D Reconstruction of Historic Landmarks from Flickr Pictures Šimetka, Vojtěch January 2015 (has links) Tato práce popisuje problematiku návrhu a vývoje aplikace pro rekonstrukci 3D modelů z 2D obrazových dat, označované jako bundle adjustment. Práce analyzuje proces 3D rekonstrukce a důkladně popisuje jednotlivé kroky. Prvním z kroků je automatizované získání obrazové sady z internetu. Je představena sada skriptů pro hromadné stahování obrázků ze služeb Flickr a Google Images a shrnuty požadavky na tyto obrázky pro co nejlepší 3D rekonstrukci. Práce dále popisuje různé detektory, extraktory a párovací algoritmy klíčových bodů v obraze s cílem najít nejvhodnější kombinaci pro rekonstrukci budov. Poté je vysvětlen proces rekonstrukce 3D struktury, její optimalizace a jak je tato problematika realizovaná v našem programu. Závěr práce testuje výsledky získané z implementovaného programu pro několik různých datových sad a porovnává je s výsledky ostatních podobných programů, představených v úvodu práce.
26	Automatické třídění fotografií podle obsahu / Automatic Photography Categorization Gajová, Veronika January 2012 (has links) Purpose of this thesis is to design and implement a tool for automatic categorization of photos. The proposed tool is based on the Bag of Words classification method and it is realized as a plug-in for the XnView image viewer. The plug-in is able to classify a selected group of photos into predefined image categories. Subsequent notation of image categories is written directly into IPTC metadata of the picture as a keyword.
27	Spatio-Temporal Networks for Human Activity Recognition based on Optical Flow in Omnidirectional Image Scenes Seidel, Roman 29 February 2024 (has links) The ability of human beings to perceive the environment around them with their visual system is called motion perception. This means that the attention of our visual system is primarily focused on those objects that are moving. The property of human motion perception is used in this dissertation to infer human activity from data using artificial neural networks. One of the main aims of this thesis is to discover which modalities, namely RGB images, optical flow and human keypoints, are best suited for HAR in omnidirectional data. Since these modalities are not yet available for omnidirectional cameras, they are synthetically generated and captured with an omnidirectional camera. During data generation, a distinction is made between synthetically generated omnidirectional data and a real omnidirectional dataset that was recorded in a Living Lab at Chemnitz University of Technology and subsequently annotated by hand. The synthetically generated dataset, called OmniFlow, consists of RGB images, optical flow in forward and backward directions, segmentation masks, bounding boxes for the class people, as well as human keypoints. The real-world dataset, OmniLab, contains RGB images from two top-view scenes as well as manually annotated human keypoints and estimated forward optical flow. In this thesis, the generation of the synthetic and real-world datasets is explained. The OmniFlow dataset is generated using the 3D rendering engine Blender, in which a fully configurable 3D indoor environment is created with artificially textured rooms, human activities, objects and different lighting scenarios. A randomly placed virtual camera following the omnidirectional camera model renders the RGB images, all other modalities and 15 predefined activities. The result of modelling the 3D indoor environment is the OmniFlow dataset. Due to the lack of omnidirectional optical flow data, the OmniFlow dataset is validated using Test-Time Augmentation (TTA). Compared to the baseline, which contains Recurrent All-Pairs Field Transforms (RAFT) trained on the FlyingChairs and FlyingThings3D datasets, it was found that only about 1000 images need to be used for fine-tuning to obtain a very low End-point Error (EE). Furthermore, it was shown that the influence of TTA on the test dataset of OmniFlow affects EE by about a factor of three. As a basis for generating artificial keypoints on OmniFlow with action labels, the Carnegie Mellon University motion capture database is used with a large number of sports and household activities as skeletal data defined in the BVH format. From the BVH-skeletal data, the skeletal points of the people performing the activities can be directly derived or extrapolated by projecting these points from the 3D world into an omnidirectional 2D image. The real-world dataset, OmniLab, was recorded in two rooms of the Living Lab with five different people mimicking the 15 actions of OmniFlow. Human keypoint annotations were added manually in two iterations to reduce the error rate of incorrect annotations. The activity-level evaluation was investigated using a TSN and a PoseC3D network. The TSN consists of two CNNs, a spatial component trained on RGB images and a temporal component trained on the dense optical flow fields of OmniFlow. The PoseC3D network, an approach to skeleton-based activity recognition, uses a heatmap stack of keypoints in combination with 3D convolution, making the network more effective at learning spatio-temporal features than methods based on 2D convolution. In the first step, the networks were trained and validated on the synthetically generated dataset OmniFlow. In the second step, the training was performed on OmniFlow and the validation on the real-world dataset OmniLab. For both networks, TSN and PoseC3D, three hyperparameters were varied and the top-1, top-5 and mean accuracy given. First, the learning rate of the stochastic gradient descent (Stochastic Gradient Descent (SGD)) was varied. Secondly, the clip length, which indicates the number of consecutive frames for learning the network, was varied, and thirdly, the spatial resolution of the input data was varied. For the spatial resolution variation, five different image sizes were generated from the original dataset by cropping from the original dataset of OmniFlow and OmniLab. It was found that keypoint-based HAR with PoseC3D performed best compared to human activity classification based on optical flow and RGB images. This means that the top-1 accuracy was 0.3636, the top-5 accuracy was 0.7273 and the mean accuracy was 0.3750, showing that the most appropriate output resolution is 128px × 128px and the clip length is at least 24 consecutive frames. The best results could be achieved with a learning rate of PoseC3D of 10-3. In addition, confusion matrices indicating the class-wise accuracy of the 15 activity classes have been given for the modalities RGB images, optical flow and human keypoints. The confusion matrix for the modality RGB images shows the best classification result of the TSN for the action walk with an accuracy of 1.00, but almost all other actions are also classified as walking in real-world data. The classification of human actions based on optical flow works best on the action sit in chair and stand up with an accuracy of 1.00 and walk with 0.50. Furthermore, it is noticeable that almost all actions are classified as sit in chair and stand up, which indicates that the intra-class variance is low, so that the TSN is not able to distinguish between the selected action classes. Validated on real-world data for the modality keypoint the actions rugpull (1.00) and cleaning windows (0.75) performs best. Therefore, the PoseC3D network on a time-series of human keypoints is less sensitive to variations in the image angle between the synthetic and real-world data than for the modalities RGB images and optical flow. The pipeline for the generation of synthetic data with regard to a more uniform distribution of the motion magnitudes needs to be investigated in future work. Random placement of the person and other objects is not sufficient for a complete coverage of all movement magnitudes. An additional improvement of the synthetic data could be the rotation of the person around their own axis, so that the person moves in a different direction while performing the activity and thus the movement magnitudes contain more variance. Furthermore, the domain transition between synthetic and real-world data should be considered further in terms of viewpoint invariance and augmentation methods. It may be necessary to generate a new synthetic dataset with only top-view data and re-train the TSN and PoseC3D. As an augmentation method, for example, the Fourier Domain Adaption (FDA) could reduce the domain gap between the synthetically generated and the real-world dataset.:1 Introduction 2 Theoretical Background 3 Related Work 4 Omnidirectional Synthetic Human Optical Flow 5 Human Keypoints for Pose in Omnidirectional Images 6 Human Activity Recognition in Indoor Scenarios 7 Conclusion and Future Work A Chapter 4: Flow Dataset Statistics B Chapter 5: 3D Rotation Matrices C Chapter 6: Network Training Parameters info:eu-repo/classification/ddc/000 ddc:000 info:eu-repo/classification/ddc/620 ddc:620

Page generated in 0.0252 seconds