Spelling suggestions: "subject:"convolutional beural betworks."" "subject:"convolutional beural conetworks.""
161 |
Convolutional Kernel Networks for Action Recognition in VideosWynen, Daan January 2015 (has links)
While convolutional neural networks (CNNs) have taken the lead for many learning tasks, action recognition in videos has yet to see this jump in performance. Many teams are working on the issue but so far there is no definitive answer how to make CNNs work well with video data. Recently, introduced convolutional kernel networks, a special case of CNNs which can be trained layer by layer in an unsupervised manner. This is done by approximating a kernel function in every layer with finite-dimensional descriptors. In this work we show the application of the CKN training to video, discuss the adjustments necessary and the influence of the type of data presented to the networks as well as the number of filters used.
|
162 |
Detection and Segmentation of Brain Metastases with Deep Convolutional NetworksLosch, Max January 2015 (has links)
As deep convolutional networks (ConvNets) reach spectacular results on a multitude of computer vision tasks and perform almost as well as a human rater on the task of segmenting gliomas in the brain, I investigated the applicability for detecting and segmenting brain metastases. I trained networks with increasing depth to improve the detection rate and introduced a border-pair-scheme to reduce oversegmentation. A constraint on the time for segmenting a complete brain scan required the utilization of fully convolutional networks which reduced the time from 90 minutes to 40 seconds. Despite some present noise and label errors in the 490 full brain MRI scans, the final network achieves a true positive rate of 82.8% and 0.05 misclassifications per slice where all lesions greater than 3 mm have a perfect detection score. This work indicates that ConvNets are a suitable approach to both detect and segment metastases, especially as further architectural extensions might improve the predictive performance even more.
|
163 |
Curriculum Learning with Deep Convolutional Neural NetworksAvramova, Vanya January 2015 (has links)
Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.
|
164 |
Deep learning for measuring radon on plastic filmsLöfgren, Max January 2021 (has links)
Exposure to high levels of radon can be very harmful and lead to serious health issues. Measuring radon in buildings and houses is an important measure to prevent this. One way of measuring radon is to place out plastic films to be exposed over a period and then analyze images of them. Image processing together with deep learning has become very useful for image recognition and analysis. Training artificial neural networks using huge amount of data to learn to classify and predict new data is a widely used approach. In this project, artificial neural networks were trained to be able to predict the radon measurement of exposed plastic films. The data used was microscopic images of these films that first was modified to fit the training better and then sorted into two datasets. The datasets were divided into 10 classes with measurement values in intervals of 100 up to 1000. Two main types of neural networks were used in different shapes and with different training parameters: Convolutional neural networks and Dense neural networks. The convolutional model was able to predict new data with a 70 percent accuracy and the performance increased with a bigger image size (more pixels) but not with a deeper network architecture. Over 90 percent of the wrong data predictions also belonged to a class in the interval just above or below the predicted result which shows that the network has potential for improvements. The dense model only had a 35 percent accuracy but had a training accuracy of over 90 percent. This is because the model was heavily overfitted. A way to get better results could be to increase the dataset that was used with more images.
|
165 |
AI-Driven Image Manipulation : Image Outpainting Applied on Fashion ImagesMennborg, Alexander January 2021 (has links)
The e-commerce industry frequently has to deal with displaying product images in a website where the images are provided by the selling partners. The images in question can have drastically different aspect ratios and resolutions which makes it harder to present them while maintaining a coherent user experience. Manipulating images by cropping can sometimes result in parts of the foreground (i.e. product or person within the image) to be cut off. Image outpainting is a technique that allows images to be extended past its boundaries and can be used to alter the aspect ratio of images. Together with object detection for locating the foreground makes it possible to manipulate images without sacrificing parts of the foreground. For image outpainting a deep learning model was trained on product images that can extend images by at least 25%. The model achieves 8.29 FID score, 44.29 PSNR score and 39.95 BRISQUE score. For testing this solution in practice a simple image manipulation pipeline was created which uses image outpainting when needed and it shows promising results. Images can be manipulated in under a second running on ZOTAC GeForce RTX 3060 (12GB) GPU and a few seconds running on a Intel Core i7-8700K (16GB) CPU. There is also a special case of images where the background has been digitally replaced with a solid color and they can be outpainted even faster without deep learning.
|
166 |
Aplicación de redes neuronales convolucionales para la emulación del modelo psicoacústico MPEG-1, capa I, para la codificación de señales de audio / Convolutional neural networks applied to the emulation of the psychoacoustic model for MPEG-1, Layer I audio signal encodersSanchez Huapaya, Alonso Sebastián, Serpa Pinillos, Sergio André 26 August 2020 (has links)
Solicitud de envío manuscrito de artículo científico. / El presente trabajo propone 4 alternativas de codificadores inspirados en el codificador MPEG-1, capa I, descrito en el estándar ISO/IEC 11172-3. El problema que se intenta resolver es el de requerir definir un modelo psicoacústico explícitamente para lograr codificar audio, reemplazándolo por redes neuronales. Todas las alternativas de codificador están basadas en redes neuronales convolucionales multiescala (MCNN) que emulan el modelo psicoacústico 1 del codificador mencionado. Las redes tienen 32 entradas que corresponden a las 32 subbandas del nivel de presión sonora (SPL – sound pressure level), y una única salida que corresponde a una de las 32 subbandas de o bien la relación señal a máscara (SMR) o bien el vector de asignación de bits. Es decir, un codificador está compuesto de un conjunto de 32 redes neuronales. La validación empleó los 10 primeros segundos de 15 canciones elegidas aleatoriamente de 10 géneros musicales distintos. Se comparó la calidad de las señales de audio generadas por cada codificador contra la de MPEG-1, capa I, mediante la métrica de ODG. El codificador cuya entrada es el SPL y cuya salida es la SMR, planteado por Guillermo Kemper, obtuvo los mejores resultados al realizar la comparación para 96 kbps y 192 kbps. El codificador denominado “SBU1” obtuvo los mejores resultados para 128 kbps. / The present work proposes 4 encoder alternatives, inspired in the MPEG-1, layer I encoder described in the ISO/IEC 11172-3 standard. The problem addressed here is the requirement of explicitly defining a psychoacoustic model to code audio, instead replacing it by neural networks. All the proposals are based on multiscale convolutional neural networks (MCNN) that emulate the psychoacoustic model 1 of the referred encoder. The networks have 32 inputs that map the 32 subbands of the sound pressure level (SPL), and a single output that corresponds to each of the 32 subbands of either the signal-to-mask ratio (SMR) or the bit allocation vector. Thus, an encoder is composed of a set of 32 neural networks. The validation process took the first 10 seconds of 15 randomly chosen songs of 10 different musical genres. The audio signal quality of the proposed encoders was compared to that of the MPEG-1, layer I encoder, using the ODG metric. The encoder whose input is the SPL and whose output is the SMR, proposed by Guillermo Kemper, yielded the best results for 96 kbps and 192 kbps. The encoder named “SBU1” had the best results for 128 kbps. / Tesis
|
167 |
Predikce rychlosti a absolutni rychlosti pohybu z lidských intrakraniálních EEG dat pomocí hlubokých neuronových sítí. / Predikce rychlosti a absolutni rychlosti pohybu z lidských intrakraniálních EEG dat pomocí hlubokých neuronových sítí.Vystrčilová, Michaela January 2021 (has links)
Our brain controls the processes of the body including movement. In this thesis, we try to understand how the information about hand movement is encoded into the brain's electrical activity and how this activity can be used to predict the velocity and absolute velocity of hand movements. Using a well-established deep neural network architecture for EEG decoding - the Deep4Net - we predict hand movement velocity and absolute velocity from intracranial EEG signals. While reaching the expected performance level, we determine the influence of different frequency bands on the network's prediction. We find that modulations in the high-gamma frequency band are less informative than expected based on previous studies. We also identify two architectural modifications which lead to higher performances. 1. the removal of max-pooling layers in the architecture leads to significantly higher correlations. 2. the non-uniform receptive field of the network is a potential drawback making the network biased towards less relevant information. 1
|
168 |
Deep Learning for 3D Perception: Computer Vision and Tactile SensingGarcia-Garcia, Alberto 23 October 2019 (has links)
The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. To achieve full scene understanding, various tasks must be accomplished. In this thesis we will focus on three of them: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool for solving such recognition, segmentation, and prediction problems. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability.
|
169 |
Tabular Information Extraction from Datasheets with Deep Learning for Semantic ModelingAkkaya, Yakup 22 March 2022 (has links)
The growing popularity of artificial intelligence and machine learning has led to the adop-
tion of the automation vision in the industry by many other institutions and organizations.
Many corporations have made it their primary objective to make the delivery of goods and
services and manufacturing in a more efficient way with minimal human intervention. Au-
tomated document processing and analysis is also a critical component of this cycle for
many organizations that contribute to the supply chain. The massive volume and diver-
sity of data created in this rapidly evolving environment make this a highly desired step.
Despite this diversity, important information in the documents is provided in the tables.
As a result, extracting tabular data is a crucial aspect of document processing.
This thesis applies deep learning methodologies to detect table structure elements for
the extraction of data and preparation for semantic modelling. In order to find optimal
structure definition, we analyzed the performance of deep learning models in different
formats such as row/column and cell. The combined row and column detection models
perform poorly compared to other models’ detection performance due to the highly over-
lapping nature of rows and columns. Separate row and column detection models seem
to achieve the best average F1-score with 78.5% and 79.1%, respectively. However, de-
termining cell elements from the row and column detections for semantic modelling is
a complicated task due to spanning rows and columns. Considering these facts, a new
method is proposed to set the ground-truth information called a content-focused annota-
tion to define table elements better. Our content-focused method is competent in handling
ambiguities caused by huge white spaces and lack of boundary lines in table structures;
hence, it provides higher accuracy.
Prior works have addressed the table analysis problem under table detection and table
structure detection tasks. However, the impact of dataset structures on table structure
detection has not been investigated. We provide a comparison of table structure detection
performance with cropped and uncropped datasets. The cropped set consists of only
table images that are cropped from documents assuming tables are detected perfectly.
The uncropped set consists of regular document images. Experiments show that deep
learning models can improve the detection performance by up to 9% in average precision
and average recall on the cropped versions. Furthermore, the impact of cropped images is
negligible under the Intersection over Union (IoU) values of 50%-70% when compared to
the uncropped versions. However, beyond 70% IoU thresholds, cropped datasets provide
significantly higher detection performance.
|
170 |
Semantic Segmentation of Urban Scene Images Using Recurrent Neural NetworksDaliparthi, Venkata Satya Sai Ajay January 2020 (has links)
Background: In Autonomous Driving Vehicles, the vehicle receives pixel-wise sensor data from RGB cameras, point-wise depth information from the cameras, and sensors data as input. The computer present inside the Autonomous Driving vehicle processes the input data and provides the desired output, such as steering angle, torque, and brake. To make an accurate decision by the vehicle, the computer inside the vehicle should be completely aware of its surroundings and understand each pixel in the driving scene. Semantic Segmentation is the task of assigning a class label (Such as Car, Road, Pedestrian, or Sky) to each pixel in the given image. So, a better performing Semantic Segmentation algorithm will contribute to the advancement of the Autonomous Driving field. Research Gap: Traditional methods, such as handcrafted features and feature extraction methods, were mainly used to solve Semantic Segmentation. Since the rise of deep learning, most of the works are using deep learning to dealing with Semantic Segmentation. The most commonly used neural network architecture to deal with Semantic Segmentation was the Convolutional Neural Network (CNN). Even though some works made use of Recurrent Neural Network (RNN), the effect of RNN in dealing with Semantic Segmentation was not yet thoroughly studied. Our study addresses this research gap. Idea: After going through the existing literature, we came up with the idea of “Using RNNs as an add-on module, to augment the skip-connections in Semantic Segmentation Networks through residual connections.” Objectives and Method: The main objective of our work is to improve the Semantic Segmentation network’s performance by using RNNs. The Experiment was chosen as a methodology to conduct our study. In our work, We proposed three novel architectures called UR-Net, UAR-Net, and DLR-Net by implementing our idea to the existing networks U-Net, Attention U-Net, and DeepLabV3+ respectively. Results and Findings: We empirically showed that our proposed architectures have shown improvement in efficiently segmenting the edges and boundaries. Through our study, we found that there is a trade-off between using RNNs and Inference time of the model. Suppose we use RNNs to improve the performance of Semantic Segmentation Networks. In that case, we need to trade off some extra seconds during the inference of the model. Conclusion: Our findings will not contribute to the Autonomous driving field, where we need better performance in real-time. But, our findings will contribute to the advancement of Bio-medical Image segmentation, where doctors can trade-off those extra seconds during inference for better performance.
|
Page generated in 0.1065 seconds