51 |
Deep Fusion of Imaging Modalities for Semantic Segmentation of Satellite ImagerySundelius, Carl January 2018 (has links)
In this report I summarize my master’s thesis work, in which I have investigated different approaches for fusing imaging modalities for semantic segmentation with deep convolutional networks. State-of-the-art methods for semantic segmentation of RGB-images use pre-trained models, which are fine-tuned to learn task-specific deep features. However, the use of pre-trained model weights constrains the model input to images with three channels (e.g. RGB-images). In some applications, e.g. classification of satellite imagery, there are other imaging modalities that can complement the information from the RGB modality and, thus, improve the performance of the classification. In this thesis, semantic segmentation methods designed for RGB images are extended to handle multiple imaging modalities, without compromising on the benefits, that pre-training on RGB datasets offers. In the experiments of this thesis, RGB images from satellites have been fused with normalised difference vegetation index (NDVI) and a digital surface model (DSM). The evaluation shows that the modality fusion can significantly improve the performance of semantic segmentation networks in comparison with a corresponding network with only RGB input. However, the different investigated approaches to fuse the modalities proved to achieve similar performance. The conclusion of the experiments is, that the fusion of imaging modalities is necessary, but the method of fusion has shown to be of less importance.
|
52 |
Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human VisionYang, Ying 01 February 2017 (has links)
Human cognition involves dynamic neural activities in distributed brain areas. For studying such neural mechanisms, magnetoencephalography (MEG) and electroencephalography (EEG) are two important techniques, as they non-invasively detect neural activities with a high temporal resolution. Recordings by MEG/EEG sensors can be approximated as a linear transformation of the neural activities in the brain space (i.e., the source space). However, we only have a limited number sensors compared with the many possible locations in the brain space; therefore it is challenging to estimate the source neural activities from the sensor recordings, in that we need to solve the underdetermined inverse problem of the linear transformation. Moreover, estimating source activities is typically an intermediate step, whereas the ultimate goal is to understand what information is coded and how information flows in the brain. This requires further statistical analysis of source activities. For example, to study what information is coded in different brain regions and temporal stages, we often regress neural activities on some external covariates; to study dynamic interactions between brain regions, we often quantify the statistical dependence among the activities in those regions through “connectivity” analysis. Traditionally, these analyses are done in two steps: Step 1, solve the linear problem under some regularization or prior assumptions, (e.g., each source location being independent); Step 2, do the regression or connectivity analysis. However, biases induced in the regularization in Step 1 can not be adapted in Step 2 and thus may yield inaccurate regression or connectivity results. To tackle this issue, we present novel one-step methods of regression or connectivity analysis in the source space, where we explicitly modeled the dependence of source activities on the external covariates (in the regression analysis) or the cross-region dependence (in the connectivity analysis), jointly with the source-to-sensor linear transformation. In simulations, we observed better performance by our models than by commonly used two-step approaches, when our model assumptions are reasonably satisfied. Besides the methodological contribution, we also applied our methods in a real MEG/EEG experiment, studying the spatio-temporal neural dynamics in the visual cortex. The human visual cortex is hypothesized to have a hierarchical organization, where low-level regions extract low-level features such as local edges, and high-level regions extract semantic features such as object categories. However, details about the spatio-temporal dynamics are less understood. Here, using both the two-step and our one-step regression models in the source space, we correlated neural responses to naturalistic scene images with the low-level and high-level features extracted from a well-trained convolutional neural network. Additionally, we also studied the interaction between regions along the hierarchy using the two-step and our one-step connectivity models. The results from the two-step and the one-step methods were generally consistent; however, the one-step methods demonstrated some intriguing advantages in the regression analysis, and slightly different patterns in the connectivity analysis. In the consistent results, we not only observed an early-to-late shift from low-level to high-level features, which support feedforward information flow along the hierarchy, but also some novel evidence indicating non-feedforward information flow (e.g., topdown feedback). These results can help us better understand the neural computation in the visual cortex. Finally, we compared the empirical sensitivity between MEG and EEG in this experiment, in detecting dependence between neural responses and visual features. Our results show that the less costly EEG was able to achieve comparable sensitivity with that in MEG when the number of observations was about twice of that in MEG. These results can help researchers empirically choose between MEG and EEG when planning their experiments with limited budgets.
|
53 |
Deep Convolutional Neural Networks For Detecting Cellular Changes Due To MalignancyWieslander, Håkan, Forslid, Gustav January 2017 (has links)
Discovering cancer at an early stage is an effective way to increase the chance of survival. However, since most screening processes are done manually it is time inefficient and thus costly. One way of automizing the screening process could be to classify cells using Convolutional Neural Networks. Convolutional Neural Networks have been proven to produce high accuracy for image classification tasks. This thesis investigates if Convolutional Neural Networks can be used as a tool to detect cellular changes due to malignancy in the oral cavity and uterine cervix. Two datasets containing oral cells and two datasets containing cervical cells were used. The cells were divided into normal and abnormal cells for a binary classification. The performance was evaluated for two different network architectures, ResNet and VGG. For the oral datasets the accuracy varied between 78-82% correctly classified cells depending on the dataset and network. For the cervical datasets the accuracy varied between 84-86% correctly classified cells depending on the dataset and network. These results indicates a high potential for classifying abnormalities for oral and cervical cells. ResNet was shown to be the preferable network, with a higher accuracy and a smaller standard deviation.
|
54 |
Navigability Assessment for Autonomous Systems Using Deep Neural NetworksWimby Schmidt, Ebba January 2017 (has links)
Automated navigability assessment based on image sensor data is an important concern in the design of autonomous robotic systems. The problem consists in finding a mapping from input data to the navigability status of different areas of the surrounding world. Machine learning techniques are often applied to this problem. This thesis investigates an approach to navigability assessment in the image plane, based on offline learning using deep convolutional neural networks, applied to RGB and depth data collected using a robotic platform. Training outputs were generated by manually marking out instances of near collision in the sequences and tracing back the location of the near-collision frame through the previous frames. Several combinations of network inputs were tried out. Inputs included grayscale gradient versions of the RGB frames, depth maps, image coordinate maps and motion information in the form of a previous RGB frame or heading maps. Some improvement compared to simple depth thresholding was demonstrated, mainly in the handling of noise and missing pixels in the depth maps. The resulting networks appear to be mostly dependent on depth information; an attempt to train a network without the depth frames was unsuccessful,and a network trained using the depth frames alone performed similarly to networks trained with additional inputs. An unsuccessful attempt at training a network towards a more motion-dependent navigability concept was also made. It was done by including training frames captured as the robot was moving away from the obstacle, where the corresponding training outputs were marked as obstacle-free.
|
55 |
PREDICTING ENERGETIC MATERIAL PROPERTIES AND INVESTIGATING THE EFFECT OF PORE MORPHOLOGY ON SHOCK SENSITIVITY VIA MACHINE LEARNINGAlex Donald Casey (9167681) 28 July 2020 (has links)
<div>An improved understanding of energy localization ("hot spots'') is needed to improve the safety and performance of explosives. In this work I establish a variety of experimental and computational methods to aid in the investigation of hot spots. In particular, focus is centered on the implicit relationship between hot spots and energetic material sensitivity. To begin, I propose a technique to visualize and quantify the properties of a dynamic hot spot from within an energetic composite subjected to ultrasonic mechanical excitation. The composite is composed of an optically transparent binder and a countable number of HMX crystals. The evolving temperature field is measured by observing the luminescence from embedded phosphor particles and subsequent application of the intensity ratio method. The spatial temperature precision is less than 2% of the measured absolute temperature in the temperature regime of interest (23-220 C). The temperature field is mapped from within an HMX-binder composite under periodic mechanical excitation.</div><div> </div><div> Following this experimental effort I examine the statistics behind the most prevalent and widely used sensitivity test (at least within the energetic materials community) and suggest adaptions to generalize the approach to bimodal latent distributions. Bimodal latent distributions may occur when manufacturing processes are inconsistent or when competing initiation mechanisms are present.</div><div> </div><div> Moving to simulation work, I investigate how the internal void structure of a solid explosive influences initiation behavior -- specifically the criticality of isolated hot spots -- in response to a shock insult. In the last decade, there has been a significant modeling and simulation effort to investigate the thermodynamic response of a shock induced pore collapse process in energetic materials. However, the majority of these studies largely ignore the geometry of the pore and assume simplistic shapes, typically a sphere. In this work, the influence of pore geometry on the sensitivity of shocked HMX is explored. A collection of pore geometries are retrieved from micrographs of pressed HMX samples via scanning electron microscopy. The shock induced collapse of these geometries are simulated using CTH and the response is reduced to a binary "critical'’ / "sub-critical'' result. The simulation results are used to assign a minimum threshold velocity required to exhibit a critical response to each pore geometry. The pore geometries are subsequently encoded to numerical representations and a functional mapping from pore shape to a threshold velocity is developed using supervised machine-learned models. The resulting models demonstrate good predictive capability and their relative performance is explored. The established models are exposed via a web application to further investigate which shape features most heavily influence sensitivity.</div><div> </div><div> Finally, I develop a convolutional neural network capable of directly parsing the 3D electronic structure of a molecule described by spatial point data for charge density and electrostatic potential represented as a 4D tensor. This method effectively bypasses the need to construct complex representations, or descriptors, of a molecule. This is beneficial because the accuracy of a machine learned model depends on the input representation. Ideally, input descriptors encode the essential physics and chemistry that influence the target property. Thousands of molecular descriptors have been proposed and proper selection of features requires considerable domain expertise or exhaustive and careful statistical downselection. In contrast, deep learning networks are capable of learning rich data representations. This provides a compelling motivation to use deep learning networks to learn molecular structure-property relations from "raw'' data. The convolutional neural network model is jointly trained on over 20,000 molecules that are potentially energetic materials (explosives) to predict dipole moment, total electronic energy, Chapman-Jouguet (C-J) detonation velocity, C-J pressure, C-J temperature, crystal density, HOMO-LUMO gap, and solid phase heat of formation. To my knowledge, this demonstrates the first use of the complete 3D electronic structure for machine learning of molecular properties. </div>
|
56 |
Sensor Fusion for 3D Object Detection for Autonomous VehiclesMassoud, Yahya 14 October 2021 (has links)
Thanks to the major advancements in hardware and computational power, sensor technology, and artificial intelligence, the race for fully autonomous driving systems is heating up. With a countless number of challenging conditions and driving
scenarios, researchers are tackling the most challenging problems in driverless cars.
One of the most critical components is the perception module, which enables an autonomous vehicle to "see" and "understand" its surrounding environment. Given
that modern vehicles can have large number of sensors and available data streams,
this thesis presents a deep learning-based framework that leverages multimodal
data – i.e. sensor fusion, to perform the task of 3D object detection and localization.
We provide an extensive review of the advancements of deep learning-based
methods in computer vision, specifically in 2D and 3D object detection tasks. We also
study the progress of the literature in both single-sensor and multi-sensor data fusion techniques. Furthermore, we present an in-depth explanation of our proposed
approach that performs sensor fusion using input streams from LiDAR and Camera
sensors, aiming to simultaneously perform 2D, 3D, and Bird’s Eye View detection.
Our experiments highlight the importance of learnable data fusion mechanisms and
multi-task learning, the impact of different CNN design decisions, speed-accuracy
tradeoffs, and ways to deal with overfitting in multi-sensor data fusion frameworks.
|
57 |
Cooperative edge deepfake detectionHasanaj, Enis, Aveler, Albert, Söder, William January 2021 (has links)
Deepfakes are an emerging problem in social media and for celebrities and political profiles, it can be devastating to their reputation if the technology ends up in the wrong hands. Creating deepfakes is becoming increasingly easy. Attempts have been made at detecting whether a face in an image is real or not but training these machine learning models can be a very time-consuming process. This research proposes a solution to training deepfake detection models cooperatively on the edge. This is done in order to evaluate if the training process, among other things, can be made more efficient with this approach. The feasibility of edge training is evaluated by training machine learning models on several different types of iPhone devices. The models are trained using the YOLOv2 object detection system. To test if the YOLOv2 object detection system is able to distinguish between real and fake human faces in images, several models are trained on a computer. Each model is trained with either different number of iterations or different subsets of data, since these metrics have been identified as important to the performance of the models. The performance of the models is evaluated by measuring the accuracy in detecting deepfakes. Additionally, the deepfake detection models trained on a computer are ensembled using the bagging ensemble method. This is done in order to evaluate the feasibility of cooperatively training a deepfake detection model by combining several models. Results show that the proposed solution is not feasible due to the time the training process takes on each mobile device. Additionally, each trained model is about 200 MB, and the size of the ensemble model grows linearly by each model added to the ensemble. This can cause the ensemble model to grow to several hundred gigabytes in size.
|
58 |
Segmentace chrupavčité tkáně ve 3D mikro CT snímcích myších embryí / Segmentation of cartilage tissue of mouse embryos in 3D micro CT dataMatula, Jan January 2019 (has links)
Manual segmentation of cartilage tissue in micro CT images of mouse embryos is a very time consuming process and significantly increases the time required for the research of mammal facial structure development. This problem might be solved by using a fully-automatic segmentation algorithm. In this diploma thesis a fully-automatic segmentation method is proposed using a convolutional neural network trained on manually segmented data. The architecture of the proposed convolutional network is based on the U-Net architecture with it's encoding part substituted for the encoding part of the VGG16 classification convolutional neural network pretrained on the ImageNet database of labeled images. The proposed network achieves Dice coefficient 0.8731 ± 0.0326 in comparison to manually segmented images.
|
59 |
Segmentace klenby lebeční u pacientů po kraniektomii / Segmentation of cranial bone after craniectomyVavřinová, Pavlína January 2020 (has links)
This thesis deals with the segmentation of cranial bone in CT patient’s data after craniectomy. The U-Net architecture in 2D and 3D variant were selected for the intention of solving this problem. Jaccard index for 2D U-Net was evaluate as 89,4 % and for 3D U-Net it was 67,1 %. In the area after surgical intervention evaluating index has smaller difference between both variant, the average success rate of skull classification was 98,4 % for 2D U-Net and 97,0 % for 3D U-Net.
|
60 |
Sémantická segmentace obrazu pomocí konvolučních neuronových sítí / Semantic segmentation of images using convolutional neural networksŠpila, Filip January 2020 (has links)
Tato práce se zabývá rešerší a implementací vybraných architektur konvolučních neuronových sítí pro segmentaci obrazu. V první části jsou shrnuty základní pojmy z teorie neuronových sítí. Tato část také představuje silné stránky konvolučních sítí v oblasti rozpoznávání obrazových dat. Teoretická část je uzavřena rešerší zaměřenou na konkrétní architekturu používanou na segmentaci scén. Implementace této architektury a jejích variant v Caffe je převzata a upravena pro konkrétní použití v praktické části práce. Nedílnou součástí tohoto procesu jsou kroky potřebné ke správnému nastavení softwarového a hardwarového prostředí. Příslušná kapitola proto poskytuje přesný návod, který ocení zejména noví uživatelé Linuxu. Pro trénování všech variant vybrané sítě je vytvořen vlastní dataset obsahující 2600 obrázků. Je také provedeno několik nastavení původní implementace, zvláště pro účely použití předtrénovaných parametrů. Trénování zahrnuje výběr hyperparametrů, jakými jsou například typ optimalizačního algoritmu a rychlost učení. Na závěr je provedeno vyhodnocení výkonu a výpočtové náročnosti všech natrénovaných sítí na testovacím datasetu.
|
Page generated in 0.1212 seconds