• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 404
  • 43
  • Tagged with
  • 447
  • 447
  • 446
  • 445
  • 443
  • 442
  • 441
  • 441
  • 441
  • 141
  • 91
  • 77
  • 72
  • 64
  • 64
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Unsupervised Feature Extraction of Clothing Using Deep Convolutional Variational Autoencoders / Oövervakad extrahering av kännetecknande drag av kläder genom djupa självkodande neurala faltningsnätverk

Blom, Fredrik January 2018 (has links)
As online retail continues to grow, large amounts of valuable data, such as transaction and search history, and, specifically for fashion retail, similarly structured images of clothing, is generated. By using unsupervised learning, it is possible to tap into this almost unlimited supply of data. This thesis set out to determine to what extent generative models – in particular, deep convolutional variational autoencoders – can be used to automatically extract representative features from images of clothing in a completely unsupervised manner. In reviewing variations of the autoencoder, both in terms of reconstruction quality and the ability to generate new realistic samples, results suggest that there exists an optimal size of the latent vector in relation to the image data complexity. Furthermore, by weighting the latent loss and generation loss in the loss function, it was possible to disentangle the learned features such that each feature captured a unique defining characteristic of clothing items (here t-shirts and tops). / I takt med att E-handeln fortsätter att växa och kunderna i ökad utsträckning rör sig online, genereras stora mängder värdefull data, exempelvis transaktions- och sökhistorik, och specifikt för klädeshandeln, välstrukturerade bilder av kläder. Genom att använda oövervakad maskininlärning (unsupervised machine learning) är det möjligt att utnyttja denna, nästan obegränsade mängd data. Detta arbete syftar till att utreda i vilken utsträckning generativa modeller, särskilt djupa självkodande neurala faltningsnätverk (deep convolutional variational autoencoders), kan användas för att automatiskt extrahera definierande drag från bilder av kläder. Genom att granska olika varianter av självkodaren framträder en optimal relation mellan storleken på den latenta vektorn och komplexiteten på den bilddata som nätverket tränades på. Vidare noterades att dragen kan fördeladas unikt på variablerna, i detta fall t-shirts och toppar, genom att vikta den latenta förlustfunktionen.
102

Deep Perceptual Loss for Improved Downstream Prediction

Grund Pihlgren, Gustav January 2021 (has links)
No description available.
103

Benchmarking structure from motion algorithms with video footage taken from a drone against laser-scanner generated 3D models

Martell, Angel Alfredo January 2017 (has links)
Structure from motion is a novel approach to generate 3D models of objects and structures. The dataset simply consists of a series of images of an object taken from different positions. The ease of the data acquisition and the wide array of available algorithms makes the technique easily accessible. The structure from motion method identifies features in all the images from the dataset, like edges with gradients in multiple directions, and tries to match these features between all the images and then computing the relative motion that the camera was subject to between any pair of images. It builds a 3D model with the correlated features. It then creates a 3D point cloud with colour information of the scanned object. There are different implementations of the structure from motion method that use different approaches to solve the feature-correlation problem between the images from the data set, different methods for detecting the features and different alternatives for sparse reconstruction and dense reconstruction as well. These differences influence variations in the final output across distinct algorithms. This thesis benchmarked these different algorithms in accuracy and processing time. For this purpose, a terrestrial 3D laser scanner was used to scan structures and buildings to generate a ground truth reference to which the structure from motion algorithms were compared. Then a video feed from a drone with a built-in camera was captured when flying around the structure or building to generate the input for the structure from motion algorithms. Different structures are considered taking into account how rich or poor in features they are, since this impacts the result of the structure from motion algorithms. The structure from motion algorithms generated 3D point clouds, which then are analysed with a tool like CloudCompare to benchmark how similar it is to the laser scanner generated data, and the runtime was recorded for comparing it across all algorithms. Subjective analysis has also been made, such as how easy to use the algorithm is and how complete the produced model looks in comparison to the others. In the comparison it was found that there is no absolute best algorithm, since every algorithm highlights in different aspects. There are algorithms that are able to generate a model very fast, managing to scale the execution time linearly in function of the size of their input, but at the expense of accuracy. There are also algorithms that take a long time for dense reconstruction, but generate almost complete models even in the presence of featureless surfaces, like COLMAP modified PatchMacht algorithm. The structure from motion methods are able to generate models with an accuracy of up to \unit[3]{cm} when scanning a simple building, where Visual Structure from Motion and Open Multi-View Environment ranked among the most accurate. It is worth highlighting that the error in accuracy grows as the complexity of the scene increases. Finally, it was found that the structure from motion method cannot reconstruct correctly structures with reflective surfaces, as well as repetitive patterns when the images are taken from mid to close range, as the produced errors can be as high as \unit[1]{m} on a large structure.
104

Online Camera-IMU Calibration

Karlhede, Arvid January 2022 (has links)
This master thesis project was done together with Saab Dynamics in Linköping the spring of 2022 and aims to perform an online IMU-camera calibration using an AprilTag board. Experiments are conducted on two different types of datasets, the public dataset Euroc and internal datasets from Saab. The calibration is done iteratively by solving a series of nonlinear optimization problems without any initial knowledge of the sensor configuration. The method is largely based on work by Huang and collaborators. Other than just finding the transformation between the IMU and the camera, the biases in the IMU, and the time delay between the two sensors are also explored. By comparing the resulting transformation with Kalibr, the current state of the art offline calibration toolbox, it is possible to conclude that the model can find and correct for the biases in the gyroscope. Therefore it is important to include these biases in the model. The model is able to roughly find the time shift between the two sensors but has more difficulties correcting for it. The thesis also aims to explore ways of compiling a good dataset for calibration. Results show that it is desirable to avoid rapid movements as well as images gathered at distances from the AprilTag board that very a lot. Also, having a shorter exposure time is useful to not lose AprilTag detections.
105

Event-Based Visual SLAM : An Explorative Approach

Rideg, Johan January 2023 (has links)
Simultaneous Localization And Mapping (SLAM) is an important topic within the field of roboticsaiming to localize an agent in a unknown or partially known environment while simultaneouslymapping the environment. The ability to perform robust SLAM is especially important inhazardous environments such as natural disasters, firefighting and space exploration wherehuman exploration may be too dangerous or impractical. In recent years, neuromorphiccameras have been made commercially available. This new type of sensor does not outputconventional frames but instead an asynchronous signal of events at a microsecond resolutionand is capable of capturing details in complex lightning scenarios where a standard camerawould be either under- or overexposed, making neuromorphic cameras a promising solution insituations where standard cameras struggle. This thesis explores a set of different approachesto virtual frames, a frame-based representation of events, in the context of SLAM.UltimateSLAM, a project fusing events, gray scale and IMU data, is investigated using virtualframes of fixed and varying frame rate both with and without motion compensation. Theresulting trajectories are compared to the trajectories produced when using gray scale framesand the number of detected and tracked features are compared. We also use a traditional visualSLAM project, ORB-SLAM, to investigate the Gaussian weighted virtual frames and gray scaleframes reconstructed from the event stream using a recurrent network model. While virtualframes can be used for SLAM, the event camera is not a plug and play sensor and requires agood choice of parameters when constructing virtual frames, relying on pre-existing knowledgeof the scene.
106

Examining Difficulties in Weed Detection

Ahlqvist, Axel January 2022 (has links)
Automatic detection of weeds could be used for more efficient weed control in agriculture. In this master thesis, weed detectors have been trained and examined on data collected by RISE to investigate whether an accurate weed detector could be trained on the collected data. When only using annotations of the weed class Creeping thistle for training and evaluation, a detector achieved a mAP of 0.33. When using four classes of weed, a detector was trained with a mAP of 0.07. The performance was worse than in a previous study also dealing with weed detection. Hypotheses for why the performance was lacking were examined. Experiments indicated that the problem could not fully be explained by the model being underfitted, nor by the object’s backgrounds being too similar to the foreground, nor by the quality of the annotations being too low. The performance was better when training the model with as much data as possible than when only selected segments of the data were used.
107

Precise Robot Navigation Between Fixed End and Starting Points - Combining GPS and Image Analysis

Balusulapalem, Hanumat Sri Naga Sai, Amarwani, Julie Rajkumar January 2024 (has links)
The utilization of image analysis and object detection spans various industries, serving purposes such as anomaly detection, automated workflows, and monitoring tool wear and tear. This thesis addresses the challenge of achieving precise robot navigation between fixed start and end points by combining GPS and image analysis. The underlying motivation for tackling this issue lies in facilitating the creation of immersive videos, mainly aimed at individuals with disabilities, enabling them to virtually explore diverse locations through a compilation of shorter video clips.  The research delves into diverse models for object detection frameworks and tools, including NVIDIA Detectnet, and YOLOv5. Through a comprehensive evaluation of their performance and accuracy, the thesis proceeds to implement a prototype system utilizing an Elegoo Smart Robot Car, a camera, a GPS module, and an embedded NVIDIA Jetson Nano system.  Performance metrics such as precision, recall, and map are employed to assess the models' effectiveness. The findings indicate that the system demonstrates high accuracy and speed in detection, exhibiting robustness across varying lighting conditions and camera settings
108

Arboreal Radiance Fields : Investigating NeRF-Based Orthophotos in Forestry

Lissmats, Olof January 2024 (has links)
This thesis explores the potential of Neural Radiance Fields (NeRF) for generating orthophotos in forestry applications. Traditional orthophoto production methods, such as those implemented in Pix4D, require high image overlap and significant data collection. NeRF, a novel 3D scene reconstruction technique, shows potential for reducing these requirements by effectively reconstructing scenes with lower image overlaps. This study compares the orthophotos produced by NeRF and Pix4D using various degrees of image overlap, evaluating the results based on geometric accuracy, image quality, and robustness to data variations. The findings indicate that NeRF can produce orthophotos from low-overlap images with geometric accuracy comparable to orthophotos produced by Pix4D from high-overlap images, though with some trade-offs in image sharpness. These results suggest potential cost savings and operational efficiencies in forestry applications, providing a viable alternative to traditional photogrammetric techniques.
109

Self-Supervised Representation Learning for Early Breast Cancer Detection in Mammographic Imaging

Kristofer, Ågren January 2024 (has links)
The proposed master's thesis investigates the adaptability and efficacy of self-supervised representation learning (SSL) in medical image analysis, focusing on Mammographic Imaging to develop robust representation learning models. This research will build upon existing studies in Mammographic Imaging that have utilized contrastive learning and knowledge distillation-based self-supervised methods, focusing on SimCLR (Chen et al 2020) and SimSiam (Chen et al 2020) and evaluate approaches to increase the classification performance in a transfer learning setting. The thesis will critically evaluate and integrate recent advancements in these SSL paradigms (Chhipa 2023, chapter 2), and incorporating additional SSL approaches. The core objective is to enhance robust generalization and label efficiency in medical imaging analysis, contributing to the broader field of AI-driven diagnostic methodologies. The proposed master's thesis will not only extend the current understanding of SSL in medical imaging but also aims to provide actionable insights that could be instrumental in enhancing breast cancer detection methodologies, thereby contributing significantly to the field of medical imaging and cancer research.
110

Visual Bird's-Eye View Object Detection for Autonomous Driving

Lidman, Erik January 2023 (has links)
In the field of autonomous driving a common scenario is to apply deep learningmodels on camera feeds to provide information about the surroundings. A recenttrend is for such vision-based methods to be centralized, in that they fuse imagesfrom all cameras in one big model for a single comprehensive output. Designingand tuning such models is hard and time consuming, in both development andtraining. This thesis aims to reproduce the results of a paper about a centralizedvision-based model performing 3D object detection, called BEVDet. Additionalgoals are to ablate the technique of class balanced grouping and sampling usedin the model, to tune the model to improve generalization, and to change thedetection head of the model to a Transformer decoder-based head. The findings include a successful reproduction of the results of the paper,while adding depth supervision to BEVDet establishes a baseline for the subsequentexperiments. An increasing validation loss during most of the training indicatesthat there is room for improvement in the generalization of the model. Severaldifferent methods are tested in order to resolve the increasing validation loss,but they all fail to do so. The ablation study shows that the class balanced groupingis important for the performance of the chosen configuration of the model,while the class balanced sampling does not contribute significantly. Without extensivetuning the replacement head gives performance similar to the PETR, themodel that the head is adapted from, but fails to match the performance of thebaseline model. In addition, the model with the Transformer decoder-based headshows a converging validation loss, unlike the baseline model.

Page generated in 0.0909 seconds