Global ETD Search

121	Behind the Scenes: Evaluating Computer Vision Embedding Techniques for Discovering Similar Photo Backgrounds Dodson, Terryl Dwayne 11 July 2023 (has links) Historical photographs can generate significant cultural and economic value, but often their subjects go unidentified. However, if analyzed correctly, visual clues in these photographs can open up new directions in identifying unknown subjects. For example, many 19th century photographs contain painted backdrops that can be mapped to a specific photographer or location, but this research process is often manual, time-consuming, and unsuccessful. AI-based computer vision algorithms could be used to automatically identify painted backdrops or photographers or cluster photos with similar backdrops in order to aid researchers. However, it is unknown which computer vision algorithms are feasible for painted backdrop identification or which techniques work better than others. We present three studies evaluating four different types of image embeddings – Inception, CLIP, MAE, and pHash – across a variety of metrics and techniques. We find that a workflow using CLIP embeddings combined with a background classifier and simulated user feedback performs best. We also discuss implications for human-AI collaboration in visual analysis and new possibilities for digital humanities scholarship. / Master of Science / Historical photographs can generate significant cultural and economic value, but often their subjects go unidentified. However, if these photographs are analyzed correctly, clues in these photographs can open up new directions in identifying unknown subjects. For example, many 19th century photographs contain painted backdrops that can be mapped to a specific photographer or location, but this research process is often manual, time-consuming, and unsuccessful. Artificial Intelligence-based computer vision techniques could be used to automatically identify painted backdrops or photographers or group together photos with similar backdrops in order to aid researchers. However, it is unknown which computer vision techniques are feasible for painted backdrop identification or which techniques work better than others. We present three studies comparing four different types of computer vision techniques – Inception, CLIP, MAE, and pHash – across a variety of metrics. We find that a workflow that combines the CLIP computer vision technique, software that automatically classifies photo backgrounds, and simulated human feedback performs best. We also discuss implications for collaboration between humans and AI for analyzing images and new possibilities for academic research combining technology and history. Convolutional Neural Networks (CNN) Computer Vision (CV) Photography History Cultural Heritage American Civil War
122	Convolutional Neural Networks for Predicting Blood Glucose Levels from Nerve Signals Say, Daniel, Spang Dyhrberg Nielsen, Frederik January 2024 (has links) Convolutional Neural Networks (CNNs) have traditionally been used for image analysis and computer vision and are known for their ability to detect complex patterns in data. This report studies an application of CNNs within bioelectronic medicine, namely predicting blood glucose levels using nerve signals. Nerve signals and blood glucose levels were measured on a mouse before and after administration of glucose injections. The nerve signals were measured by placing 16 voltage-measuring electrodes on the vagus nerve of the mouse. The obtained nerve signal data was segmented into time intervals of 5 ms and aligned with the corresponding glucose measurements. Two LeNet-5 based CNN architectures, one 1-dimensional and one 2-dimensional, were implemented and trained on the data. Evaluation of the models’ performance was based on the mean squared error, the mean absolute error, and the R2-score of a simple moving average over the dataset. Both models had promising performance with an R2-score of above 0.92, suggesting a strong correlation between nerve signals and blood glucose levels. The difference in performance between the 1-dimensional and 2-dimensional model was insignificant. These results highlight the potential of using CNNs in bioelectronic medicine for prediction of physiological parameters from nerve signal data. Convolutional Neural Networks Bioelectronic Medicine Vagus Nerve Nerve Signals Glucose Mathematics Matematik
123	Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images Stewart, Seth Andrew 01 October 2018 (has links) We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place. Convolutional Neural Networks Document Image Analysis Fully Convolutional Neural Networks Layout Analysis Page Segmentation Pixel-Labeling Region Classification Semantic Segmentation Data Augmentation Historical Document Processing Optical Character Recognition Handwriting Recognition Computer Sciences
124	Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images Stewart, Seth Andrew 01 October 2018 (has links) We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place. Convolutional Neural Networks Document Image Analysis Fully Convolutional Neural Networks Layout Analysis Page Segmentation Pixel-Labeling Region Classification Semantic Segmentation Data Augmentation Historical Document Processing Optical Character Recognition Handwriting Recognition Physical Sciences and Mathematics
125	Ensembles of Single Image Super-Resolution Generative Adversarial Networks / Ensembler av generative adversarial networks för superupplösning av bilder Castillo Araújo, Victor January 2021 (has links) Generative Adversarial Networks have been used to obtain state-of-the-art results for low-level computer vision tasks like single image super-resolution, however, they are notoriously difficult to train due to the instability related to the competing minimax framework. Additionally, traditional ensembling mechanisms cannot be effectively applied with these types of networks due to the resources they require at inference time and the complexity of their architectures. In this thesis an alternative method to create ensembles of individual, more stable and easier to train, models by using interpolations in the parameter space of the models is found to produce better results than those of the initial individual models when evaluated using perceptual metrics as a proxy of human judges. This method can be used as a framework to train GANs with competitive perceptual results in comparison to state-of-the-art alternatives. / Generative Adversarial Networks (GANs) har använts för att uppnå state-of-the- art resultat för grundläggande bildanalys uppgifter, som generering av högupplösta bilder från bilder med låg upplösning, men de är notoriskt svåra att träna på grund av instabiliteten relaterad till det konkurrerande minimax-ramverket. Dessutom kan traditionella mekanismer för att generera ensembler inte tillämpas effektivt med dessa typer av nätverk på grund av de resurser de behöver vid inferenstid och deras arkitekturs komplexitet. I det här projektet har en alternativ metod för att samla enskilda, mer stabila och modeller som är lättare att träna genom interpolation i parameterrymden visat sig ge bättre perceptuella resultat än de ursprungliga enskilda modellerna och denna metod kan användas som ett ramverk för att träna GAN med konkurrenskraftig perceptuell prestanda jämfört med toppmodern teknik. Computer and Information Sciences Data- och informationsvetenskap
126	Deep Learning based Video Super- Resolution in Computer Generated Graphics / Deep Learning-baserad video superupplösning för datorgenererad grafik Jain, Vinit January 2020 (has links) Super-Resolution is a widely studied problem in the field of computer vision, where the purpose is to increase the resolution of, or super-resolve, image data. In Video Super-Resolution, maintaining temporal coherence for consecutive video frames requires fusing information from multiple frames to super-resolve one frame. Current deep learning methods perform video super-resolution, yet most of them focus on working with natural datasets. In this thesis, we use a recurrent back-projection network for working with a dataset of computer-generated graphics, with example applications including upsampling low-resolution cinematics for the gaming industry. The dataset comes from a variety of gaming content, rendered in (3840 x 2160) resolution. The objective of the network is to produce the upscaled version of the low-resolution frame by learning an input combination of a low-resolution frame, a sequence of neighboring frames, and the optical flow between each neighboring frame and the reference frame. Under the baseline setup, we train the model to perform 2x upsampling from (1920 x 1080) to (3840 x 2160) resolution. In comparison against the bicubic interpolation method, our model achieved better results by a margin of 2dB for Peak Signal-to-Noise Ratio (PSNR), 0.015 for Structural Similarity Index Measure (SSIM), and 9.3 for the Video Multi-method Assessment Fusion (VMAF) metric. In addition, we further demonstrate the susceptibility in the performance of neural networks to changes in image compression quality, and the inefficiency of distortion metrics to capture the perceptual details accurately. / Superupplösning är ett allmänt studerat problem inom datorsyn, där syftet är att öka upplösningen på eller superupplösningsbilddata. I Video Super- Resolution kräver upprätthållande av tidsmässig koherens för på varandra följande videobilder sammanslagning av information från flera bilder för att superlösa en bildruta. Nuvarande djupinlärningsmetoder utför superupplösning i video, men de flesta av dem fokuserar på att arbeta med naturliga datamängder. I denna avhandling använder vi ett återkommande bakprojektionsnätverk för att arbeta med en datamängd av datorgenererad grafik, med exempelvis applikationer inklusive upsampling av film med låg upplösning för spelindustrin. Datauppsättningen kommer från en mängd olika spelinnehåll, återgivna i (3840 x 2160) upplösning. Målet med nätverket är att producera en uppskalad version av en ram med låg upplösning genom att lära sig en ingångskombination av en lågupplösningsram, en sekvens av intilliggande ramar och det optiska flödet mellan varje intilliggande ram och referensramen. Under grundinställningen tränar vi modellen för att utföra 2x uppsampling från (1920 x 1080) till (3840 x 2160) upplösning. Jämfört med den bicubiska interpoleringsmetoden uppnådde vår modell bättre resultat med en marginal på 2 dB för Peak Signal-to-Noise Ratio (PSNR), 0,015 för Structural Similarity Index Measure (SSIM) och 9.3 för Video Multimethod Assessment Fusion (VMAF) mätvärde. Dessutom demonstrerar vi vidare känsligheten i neuronal nätverk för förändringar i bildkomprimeringskvaliteten och ineffektiviteten hos distorsionsmätvärden för att fånga de perceptuella detaljerna exakt. Deep Learning Convolutional Neural Networks Video Super-Resolution Computer Generated Graphics Gaming. Deep Learning Convolutional Neural Networks Video Super-Resolution Computer Generated Graphics Gaming. Computer and Information Sciences Data- och informationsvetenskap
127	GVT-BDNet : Convolutional Neural Network with Global Voxel Transformer Operators for Building Damage Assessment / GVT-BDNet : Convolutional Neural Network med Global Voxel Transformer Operators för Building Damage Assessment Remondini, Leonardo January 2021 (has links) Natural disasters strike anywhere, disrupting local communication and transportation infrastructure, making the process of assessing specific local damage difficult, dangerous, and slow. The goal of Building Damage Assessment (BDA) is to quickly and accurately estimate the location, cause, and severity of the damage to maximize the efficiency of rescuers and saved lives. In current machine learning BDA solutions, attention operators are the most recent innovations adopted by researchers to increase generalizability and overall performances of Convolutional Neural Networks for the BDA task. However, the latter, nowadays exploit attention operators tailored to the specific task and specific neural network architecture, leading them to be hard to apply to other scenarios. In our research, we want to contribute to the BDA literature while also addressing this limitation. We propose Global Voxel Transformer Operators (GVTOs): flexible attention-operators originally proposed for Augmented Microscopy that can replace up-sampling, down-sampling, and size-preserving convolutions within either a U-Net or a general CNN architecture without any limitation. Dissimilar to local operators, like convolutions, GVTOs can aggregate global information and have input-specific weights during inference time, improving generalizability performance, as already proved by recent literature. We applied GVTOs on a state-of-the-art BDA model and named it GVT-BDNet. We trained and evaluated our proposal neural network on the xBD dataset; the largest and most complete dataset for BDA. We compared GVT-BDNet performance with the baseline architecture (BDNet) and observed that the former improves damaged buildings segmentation by a factor of 0.11. Moreover, GVT-BDNet achieves state-of-the-art performance on a 10% split of the xBD training dataset and on the xBD test dataset with an overall F1- score of 0.80 and 0.79, respectively. To evaluate the architecture consistency, we have also tested BDNet’s and GVT-BDNet’s generalizability performance on another segmentation task: Tree & Shadow segmentation. Results showed that both models achieved overall good performances, scoring an F1-score of 0.79 and 0.785, respectively. / Naturkatastrofer sker överallt, stör lokal kommunikations- och transportinfrastruktur, vilket gör bedömningsprocessen av specifika lokala skador svår, farlig och långsam. Målet med Building Damage Assessment (BDA) är att snabbt och precist uppskatta platsen, orsaken och allvarligheten av skadorna för att maximera effektiviteten av räddare och räddade liv. Nuvarande BDA-lösningar använder Convolutional Neural Network (CNN) och ad-hoc Attention Operators för att förbättra generaliseringsprestanda. Nyligen föreslagna attention operators är dock specifikt skräddarsydda för uppgiften och kan sakna flexibilitet för andra scenarier eller neural nätverksarkitektur. I vår forskning bidrar vi till BDA -litteraturen genom att föreslå Global Voxel Transformer Operators (GVTO): flexibla attention operators som kan appliceras på en CNN -arkitektur utan att vara bundna till en viss uppgift. Nyare litteratur visar dessutom att de kan öka utvinningen av global information och därmed generaliseringsprestanda. Vi tillämpade GVTO på en toppmodern CNN-modell för BDA. GVTO: er förbättrade skadessegmenteringsprestandan med en faktor av 0,11. Dessutom förbättrade de den senaste tekniken för xBD-testdatauppsättningen och nådde toppmodern prestanda på en 10% delning av xBD-träningsdatauppsättningen. Vi har också utvärderat generaliserbarheten av det föreslagna neurala nätverket på en annan segmenteringsuppgift (Tree Shadow segmentering), vilket uppnådde över lag bra prestationer. Attention Operators Convolutional Neural Networks (CNNs) Deep Learning Building Damage Assessment Generalizability Attention Operators Convolutional Neural Networks (CNNs) Deep Learning Building Damage Assessment Generalizability Computer and Information Sciences Data- och informationsvetenskap
128	Using Satellite Images And Self-supervised Deep Learning To Detect Water Hidden Under Vegetation / Använda satellitbilder och Självövervakad Deep Learning Till Upptäck vatten gömt under Vegetation Iakovidis, Ioannis January 2024 (has links) In recent years the wide availability of high-resolution satellite images has made the remote monitoring of water resources all over the world possible. While the detection of open water from satellite images is relatively easy, a significant percentage of the water extent of wetlands is covered by vegetation. Convolutional Neural Networks have shown great success in the task of detecting wetlands in satellite images. However, these models require large amounts of manually annotated satellite images, which are slow and expensive to produce. In this paper we use self-supervised training methods to train a Convolutional Neural Network to detect water from satellite images without the use of annotated data. We use a combination of deep clustering and negative sampling based on the paper ”Unsupervised Single-Scene Semantic Segmentation for Earth Observation”, and we expand the paper by changing the clustering loss, the model architecture and implementing an ensemble model. Our final ensemble of self-supervised models outperforms a single supervised model, showing the power of self-supervision. / Under de senaste åren har den breda tillgången på högupplösta satellitbilder möjliggjort fjärrövervakning av vattenresurser över hela världen. Även om det är relativt enkelt att upptäcka öppet vatten från satellitbilder, täcks en betydande andel av våtmarkernas vattenutbredning av vegetation. Lyckligtvis kan radarsignaler tränga igenom vegetation, vilket gör det möjligt för oss att upptäcka vatten gömt under vegetation från satellitradarbilder. Under de senaste åren har Convolutional Neural Networks visat stor framgång i denna uppgift. Tyvärr kräver dessa modeller stora mängder manuellt annoterade satellitbilder, vilket är långsamt och dyrt att producera. Självövervakad inlärning är ett område inom maskininlärning som syftar till att träna modeller utan användning av annoterade data. I den här artikeln använder vi självövervakad träningsmetoder för att träna en Convolutional Neural Network-baserad modell för att detektera vatten från satellitbilder utan användning av annoterade data. Vi använder en kombination av djup klustring och kontrastivt lärande baserat på artikeln ”Unsupervised Single-Scene Semantic Segmentation for Earth Observation”. Dessutom utökar vi uppsatsen genom att modifiera klustringsförlusten och modellarkitekturen som används. Efter att ha observerat hög varians i våra modellers prestanda implementerade vi också en ensemblevariant av vår modell för att få mer konsekventa resultat. Vår slutliga ensemble av självövervakade modeller överträffar en enda övervakad modell, vilket visar kraften i självövervakning. Machine Learning Convolutional Neural Networks Semantic Segmentation Self-supervised Learning Deep Clustering Contrastive Learning Ensemble Learning Remote Sensing Wetland Mapping Maskininlärning Convolutional Neural Networks Semantisk Segmentering Självledd Inlärning Deep Clustering Contrastive Learning Ensembleinlärning Fjärranalys Våtmarkskartering Computer and Information Sciences Data- och informationsvetenskap
129	Deep neural networks and their implementation / Deep neural networks and their implementation Vojt, Ján January 2016 (has links) Deep neural networks represent an effective and universal model capable of solving a wide variety of tasks. This thesis is focused on three different types of deep neural networks - the multilayer perceptron, the convolutional neural network, and the deep belief network. All of the discussed network models are implemented on parallel hardware, and thoroughly tested for various choices of the network architecture and its parameters. The implemented system is accompanied by a detailed documentation of the architectural decisions and proposed optimizations. The efficiency of the implemented framework is confirmed by the results of the performed tests. A significant part of this thesis represents also additional testing of other existing frameworks which support deep neural networks. This comparison indicates superior performance to the tested rival frameworks of multilayer perceptrons and convolutional neural networks. The deep belief network implementation performs slightly better for RBM layers with up to 1000 hidden neurons, but has a noticeably inferior performance for more robust RBM layers when compared to the tested rival framework. Powered by TCPDF (www.tcpdf.org)
130	Bone Fragment Segmentation Using Deep Interactive Object Selection Estgren, Martin January 2019 (has links) In recent years semantic segmentation models utilizing Convolutional Neural Networks (CNN) have seen significant success for multiple different segmentation problems. Models such as U-Net have produced promising results within the medical field for both regular 2D and volumetric imaging, rivalling some of the best classical segmentation methods. In this thesis we examined the possibility of using a convolutional neural network-based model to perform segmentation of discrete bone fragments in CT-volumes with segmentation-hints provided by a user. We additionally examined different classical segmentation methods used in a post-processing refinement stage and their effect on the segmentation quality. We compared the performance of our model to similar approaches and provided insight into how the interactive aspect of the model affected the quality of the result. We found that the combined approach of interactive segmentation and deep learning produced results on par with some of the best methods presented, provided there were adequate amount of annotated training data. We additionally found that the number of segmentation hints provided to the model by the user significantly affected the quality of the result, with convergence of the result around 8 provided hints. Machine Learning Convolutional Neural Networks Computer Vision Medical Image Segmentation

Search results