Global ETD Search

161	Curriculum Learning with Deep Convolutional Neural Networks Avramova, Vanya January 2015 (has links) Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models. curriculum learning deep learning computer vision convolutional neural networks Elektroteknik och elektronik
162	Deep learning for measuring radon on plastic films Löfgren, Max January 2021 (has links) Exposure to high levels of radon can be very harmful and lead to serious health issues. Measuring radon in buildings and houses is an important measure to prevent this. One way of measuring radon is to place out plastic films to be exposed over a period and then analyze images of them. Image processing together with deep learning has become very useful for image recognition and analysis. Training artificial neural networks using huge amount of data to learn to classify and predict new data is a widely used approach. In this project, artificial neural networks were trained to be able to predict the radon measurement of exposed plastic films. The data used was microscopic images of these films that first was modified to fit the training better and then sorted into two datasets. The datasets were divided into 10 classes with measurement values in intervals of 100 up to 1000. Two main types of neural networks were used in different shapes and with different training parameters: Convolutional neural networks and Dense neural networks. The convolutional model was able to predict new data with a 70 percent accuracy and the performance increased with a bigger image size (more pixels) but not with a deeper network architecture. Over 90 percent of the wrong data predictions also belonged to a class in the interval just above or below the predicted result which shows that the network has potential for improvements. The dense model only had a 35 percent accuracy but had a training accuracy of over 90 percent. This is because the model was heavily overfitted. A way to get better results could be to increase the dataset that was used with more images. Deep learning Neural networks Convolutional neural networks Software Engineering Programvaruteknik Other Computer and Information Science Annan data- och informationsvetenskap
163	AI-Driven Image Manipulation : Image Outpainting Applied on Fashion Images Mennborg, Alexander January 2021 (has links) The e-commerce industry frequently has to deal with displaying product images in a website where the images are provided by the selling partners. The images in question can have drastically different aspect ratios and resolutions which makes it harder to present them while maintaining a coherent user experience. Manipulating images by cropping can sometimes result in parts of the foreground (i.e. product or person within the image) to be cut off. Image outpainting is a technique that allows images to be extended past its boundaries and can be used to alter the aspect ratio of images. Together with object detection for locating the foreground makes it possible to manipulate images without sacrificing parts of the foreground. For image outpainting a deep learning model was trained on product images that can extend images by at least 25%. The model achieves 8.29 FID score, 44.29 PSNR score and 39.95 BRISQUE score. For testing this solution in practice a simple image manipulation pipeline was created which uses image outpainting when needed and it shows promising results. Images can be manipulated in under a second running on ZOTAC GeForce RTX 3060 (12GB) GPU and a few seconds running on a Intel Core i7-8700K (16GB) CPU. There is also a special case of images where the background has been digitally replaced with a solid color and they can be outpainted even faster without deep learning. image outpainting fashion images product images image cropping image resizing convolutional neural networks generative adversarial networks Computer Engineering Datorteknik
164	Aplicación de redes neuronales convolucionales para la emulación del modelo psicoacústico MPEG-1, capa I, para la codificación de señales de audio / Convolutional neural networks applied to the emulation of the psychoacoustic model for MPEG-1, Layer I audio signal encoders Sanchez Huapaya, Alonso Sebastián, Serpa Pinillos, Sergio André 26 August 2020 (has links) Solicitud de envío manuscrito de artículo científico. / El presente trabajo propone 4 alternativas de codificadores inspirados en el codificador MPEG-1, capa I, descrito en el estándar ISO/IEC 11172-3. El problema que se intenta resolver es el de requerir definir un modelo psicoacústico explícitamente para lograr codificar audio, reemplazándolo por redes neuronales. Todas las alternativas de codificador están basadas en redes neuronales convolucionales multiescala (MCNN) que emulan el modelo psicoacústico 1 del codificador mencionado. Las redes tienen 32 entradas que corresponden a las 32 subbandas del nivel de presión sonora (SPL – sound pressure level), y una única salida que corresponde a una de las 32 subbandas de o bien la relación señal a máscara (SMR) o bien el vector de asignación de bits. Es decir, un codificador está compuesto de un conjunto de 32 redes neuronales. La validación empleó los 10 primeros segundos de 15 canciones elegidas aleatoriamente de 10 géneros musicales distintos. Se comparó la calidad de las señales de audio generadas por cada codificador contra la de MPEG-1, capa I, mediante la métrica de ODG. El codificador cuya entrada es el SPL y cuya salida es la SMR, planteado por Guillermo Kemper, obtuvo los mejores resultados al realizar la comparación para 96 kbps y 192 kbps. El codificador denominado “SBU1” obtuvo los mejores resultados para 128 kbps. / The present work proposes 4 encoder alternatives, inspired in the MPEG-1, layer I encoder described in the ISO/IEC 11172-3 standard. The problem addressed here is the requirement of explicitly defining a psychoacoustic model to code audio, instead replacing it by neural networks. All the proposals are based on multiscale convolutional neural networks (MCNN) that emulate the psychoacoustic model 1 of the referred encoder. The networks have 32 inputs that map the 32 subbands of the sound pressure level (SPL), and a single output that corresponds to each of the 32 subbands of either the signal-to-mask ratio (SMR) or the bit allocation vector. Thus, an encoder is composed of a set of 32 neural networks. The validation process took the first 10 seconds of 15 randomly chosen songs of 10 different musical genres. The audio signal quality of the proposed encoders was compared to that of the MPEG-1, layer I encoder, using the ODG metric. The encoder whose input is the SPL and whose output is the SMR, proposed by Guillermo Kemper, yielded the best results for 96 kbps and 192 kbps. The encoder named “SBU1” had the best results for 128 kbps. / Tesis Modelo psicoacústico Redes neuronales Señales de audio Nivel de presión sonora Encoder Multiscale convolutional neural networks Audio signal Sound pressure level
165	Predikce rychlosti a absolutni rychlosti pohybu z lidských intrakraniálních EEG dat pomocí hlubokých neuronových sítí. / Predikce rychlosti a absolutni rychlosti pohybu z lidských intrakraniálních EEG dat pomocí hlubokých neuronových sítí. Vystrčilová, Michaela January 2021 (has links) Our brain controls the processes of the body including movement. In this thesis, we try to understand how the information about hand movement is encoded into the brain's electrical activity and how this activity can be used to predict the velocity and absolute velocity of hand movements. Using a well-established deep neural network architecture for EEG decoding - the Deep4Net - we predict hand movement velocity and absolute velocity from intracranial EEG signals. While reaching the expected performance level, we determine the influence of different frequency bands on the network's prediction. We find that modulations in the high-gamma frequency band are less informative than expected based on previous studies. We also identify two architectural modifications which lead to higher performances. 1. the removal of max-pooling layers in the architecture leads to significantly higher correlations. 2. the non-uniform receptive field of the network is a potential drawback making the network biased towards less relevant information. 1
166	Deep Learning for 3D Perception: Computer Vision and Tactile Sensing Garcia-Garcia, Alberto 23 October 2019 (has links) The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. To achieve full scene understanding, various tasks must be accomplished. In this thesis we will focus on three of them: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool for solving such recognition, segmentation, and prediction problems. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability. Deep Learning Computer Vision Synthetic Data Tactile Sensing Convolutional Neural Networks Semantic Segmentation
167	Tabular Information Extraction from Datasheets with Deep Learning for Semantic Modeling Akkaya, Yakup 22 March 2022 (has links) The growing popularity of artificial intelligence and machine learning has led to the adop- tion of the automation vision in the industry by many other institutions and organizations. Many corporations have made it their primary objective to make the delivery of goods and services and manufacturing in a more efficient way with minimal human intervention. Au- tomated document processing and analysis is also a critical component of this cycle for many organizations that contribute to the supply chain. The massive volume and diver- sity of data created in this rapidly evolving environment make this a highly desired step. Despite this diversity, important information in the documents is provided in the tables. As a result, extracting tabular data is a crucial aspect of document processing. This thesis applies deep learning methodologies to detect table structure elements for the extraction of data and preparation for semantic modelling. In order to find optimal structure definition, we analyzed the performance of deep learning models in different formats such as row/column and cell. The combined row and column detection models perform poorly compared to other models’ detection performance due to the highly over- lapping nature of rows and columns. Separate row and column detection models seem to achieve the best average F1-score with 78.5% and 79.1%, respectively. However, de- termining cell elements from the row and column detections for semantic modelling is a complicated task due to spanning rows and columns. Considering these facts, a new method is proposed to set the ground-truth information called a content-focused annota- tion to define table elements better. Our content-focused method is competent in handling ambiguities caused by huge white spaces and lack of boundary lines in table structures; hence, it provides higher accuracy. Prior works have addressed the table analysis problem under table detection and table structure detection tasks. However, the impact of dataset structures on table structure detection has not been investigated. We provide a comparison of table structure detection performance with cropped and uncropped datasets. The cropped set consists of only table images that are cropped from documents assuming tables are detected perfectly. The uncropped set consists of regular document images. Experiments show that deep learning models can improve the detection performance by up to 9% in average precision and average recall on the cropped versions. Furthermore, the impact of cropped images is negligible under the Intersection over Union (IoU) values of 50%-70% when compared to the uncropped versions. However, beyond 70% IoU thresholds, cropped datasets provide significantly higher detection performance. Deep Learning Convolutional Neural Networks Image Processing Document Processing Table Structure Detection Table Detection Tabular Data Extraction Page Object Detection
168	Semantic Segmentation of Urban Scene Images Using Recurrent Neural Networks Daliparthi, Venkata Satya Sai Ajay January 2020 (has links) Background: In Autonomous Driving Vehicles, the vehicle receives pixel-wise sensor data from RGB cameras, point-wise depth information from the cameras, and sensors data as input. The computer present inside the Autonomous Driving vehicle processes the input data and provides the desired output, such as steering angle, torque, and brake. To make an accurate decision by the vehicle, the computer inside the vehicle should be completely aware of its surroundings and understand each pixel in the driving scene. Semantic Segmentation is the task of assigning a class label (Such as Car, Road, Pedestrian, or Sky) to each pixel in the given image. So, a better performing Semantic Segmentation algorithm will contribute to the advancement of the Autonomous Driving field. Research Gap: Traditional methods, such as handcrafted features and feature extraction methods, were mainly used to solve Semantic Segmentation. Since the rise of deep learning, most of the works are using deep learning to dealing with Semantic Segmentation. The most commonly used neural network architecture to deal with Semantic Segmentation was the Convolutional Neural Network (CNN). Even though some works made use of Recurrent Neural Network (RNN), the effect of RNN in dealing with Semantic Segmentation was not yet thoroughly studied. Our study addresses this research gap. Idea: After going through the existing literature, we came up with the idea of “Using RNNs as an add-on module, to augment the skip-connections in Semantic Segmentation Networks through residual connections.” Objectives and Method: The main objective of our work is to improve the Semantic Segmentation network’s performance by using RNNs. The Experiment was chosen as a methodology to conduct our study. In our work, We proposed three novel architectures called UR-Net, UAR-Net, and DLR-Net by implementing our idea to the existing networks U-Net, Attention U-Net, and DeepLabV3+ respectively. Results and Findings: We empirically showed that our proposed architectures have shown improvement in efficiently segmenting the edges and boundaries. Through our study, we found that there is a trade-off between using RNNs and Inference time of the model. Suppose we use RNNs to improve the performance of Semantic Segmentation Networks. In that case, we need to trade off some extra seconds during the inference of the model. Conclusion: Our findings will not contribute to the Autonomous driving field, where we need better performance in real-time. But, our findings will contribute to the advancement of Bio-medical Image segmentation, where doctors can trade-off those extra seconds during inference for better performance. Image Segmentation Deep Learning Convolutional Neural Networks Recurrent Neural Networks Encoder-Decoder Models and Scene Understanding Computer Sciences Datavetenskap (datalogi)
169	Analyzing white blood cells using deep learning techniques Neelakantan, Suraj, Kalidindi, Sai Sushanth Varma January 2020 (has links) The field of hematology involves the analysis of blood and its components like platelets, red blood cells, white blood cells. The outcome of this analysis can be vital in determining the condition of the human body and it is important to obtain accurate results. A deep learning algorithm scans over the given input data for unique features and learns them. Then it identifies these features and correlates them to give the result. This can save a significant amount of time and manual work. In contrast, a traditional machine learning algorithm requires the developer to carry-out the feature engineering. This thesis involves the analysis of white blood cells (WBC) using deep learning techniques. In collaboration with a hematology company HemoCue AB based in Angelholm, we will be developing deep learning algorithms for the analysis of white blood cells in the HemoCue R WBC DIFF System. Predominantly, there are two stages in this thesis. The first stage is white blood cell identification, which is used to calculate the number of white blood cells in the given blood sample. The next stage is to identify the different types of white blood cells with which the concentration of each type of WBC in the given blood sample is calculated. We have explored different classification approaches like ’one vs all’ and ’4-class classifier’, and have developed two CNN architectural designs i.e. ’multi-input’ and ’multi-channel’. On comparing the performance of all these design approaches, a final integrated model is put forth for the analysis of WBCs in the company’s device. The proposed ’one vs all’ classification approach combined with a 3-class CNN classifier has yielded very promising results with a combined accuracy 95.45% in WBC identification and 90.49% in WBC differential classification. White blood cells Deep learning Convolutional neural networks Classification Medical device technology Medical Equipment Engineering Medicinsk apparatteknik
170	Machine Learning Methods for Brain Lesion Delineation Raina, Kevin 02 October 2020 (has links) Brain lesions are regions of abnormal or damaged tissue in the brain, commonly due to stroke, cancer or other disease. They are diagnosed primarily using neuroimaging, the most common modalities being Magnetic Resonance Imaging (MRI) or Computed Tomography (CT). Brain lesions have a high degree of variability in terms of location, size, intensity and form, which makes diagnosis challenging. Traditionally, radiologists diagnose lesions by inspecting neuroimages directly by eye; however, this is time-consuming and subjective. For these reasons, many automated methods have been developed for lesion delineation (segmentation), lesion identification and diagnosis. The goal of this thesis is to improve and develop automated methods for delineating brain lesions from multimodal MRI scans. First, we propose an improvement to existing segmentation methods by exploiting the bilateral quasi-symmetry of healthy brains, which breaks down when lesions are present. We augment our data using nonlinear registration of a neuroimage to a reflected version of itself, leading to an improvement in Dice coefficient of 13 percent. Second, we model lesion volume in brain image patches with a modified Poisson regression method. The model accurately identified the lesion image with the larger lesion volume for 86 percent of paired sample patches. Both of these projects were published in the proceedings of the BIOSTEC 2020 conference. In the last two chapters, we propose a confidence-based approach to measure segmentation uncertainty, and apply an unsupervised segmentation method based on mutual information. Machine learning Segmentation Convolutional neural networks Brain lesion Magnetic resonance (MR) Ischemic stroke Tumor Computed tomography (CT)

Search results