Spelling suggestions: "subject:"convolutional beural networks"" "subject:"convolutional aneural networks""
161 |
Three problems in imaging systems: texture re-rendering in online decoration design, a novel monochrome halftoning algorithm, and face set recognition with convolutional neural networksTongyang Liu (5929991) 25 June 2020 (has links)
<p>In this thesis, studies on three problems
in imaging systems will be discussed.</p>
<p>The first problem deals with re-rendering
segments of online indoor room images with preferred textures through websites
to try new decoration ideas. Previous methods need too much manual positioning
and alignment. In the thesis, a novel approach is presented to automatically
achieve a natural outcome with respect to indoor room geometry layout.</p>
<p>For the second problem, the laser
electrophotographic system is eagerly looking for a digital halftoning
algorithm that can deal with unequal printing resolution, since most halftoning
algorithms are focused on equal resolution. In the thesis, a novel monochrome
halftoning algorithm is presented to render continuous tone images with limited
numbers of tone levels for laser printers with unequal printing resolution.</p>
<p>For the third problem, a novel face set
recognition method is presented. Face set recognition is important for face
video analysis and face clustering in multiple imaging systems. And it is very
challenging considering the variation of image sharpness, face directions and illuminations
for different frames, as well as the number and the order of images in the face
set. To tackle the problem, a novel convolutional neural network system is
presented to generate a fixed-dimensional compact feature representation for
the face set. The system collects information from all the images in the set
while having emphasis on more frontal and sharper face images, and it is
regardless of the number and the order of images. The generated feature
representations allow direct, immediate similarity computation for face sets, thus
can be directly used for recognition. The experiment result shows that our
method outperforms other state of-the-art methods on the public test dataset.</p>
|
162 |
Efektivní implementace hlubokých neuronových sítí / Efficient implementation of deep neural networksKopál, Jakub January 2020 (has links)
In recent years, algorithms in the area of object detection have constantly been improving. The success of these algorithms has reached a level, where much of the development is focused on increasing speed at the expense of accuracy. As a result of recent improvements in the area of deep learning and new hardware architectures optimized for deep learning models, it is possible to detect objects in an image several hundreds times per second using only embedded and mobile devices. The main objective of this thesis is to study and summarize the most important methods in the area of effective object detection and apply them to a given real-world problem. By using state-of- the-art methods, we developed a traction-by-detection algorithm, which is based on our own object detection models that track transport vehicles in real-time using embedded and mobile devices. 1
|
163 |
Convolutional Kernel Networks for Action Recognition in VideosWynen, Daan January 2015 (has links)
While convolutional neural networks (CNNs) have taken the lead for many learning tasks, action recognition in videos has yet to see this jump in performance. Many teams are working on the issue but so far there is no definitive answer how to make CNNs work well with video data. Recently, introduced convolutional kernel networks, a special case of CNNs which can be trained layer by layer in an unsupervised manner. This is done by approximating a kernel function in every layer with finite-dimensional descriptors. In this work we show the application of the CKN training to video, discuss the adjustments necessary and the influence of the type of data presented to the networks as well as the number of filters used.
|
164 |
Detection and Segmentation of Brain Metastases with Deep Convolutional NetworksLosch, Max January 2015 (has links)
As deep convolutional networks (ConvNets) reach spectacular results on a multitude of computer vision tasks and perform almost as well as a human rater on the task of segmenting gliomas in the brain, I investigated the applicability for detecting and segmenting brain metastases. I trained networks with increasing depth to improve the detection rate and introduced a border-pair-scheme to reduce oversegmentation. A constraint on the time for segmenting a complete brain scan required the utilization of fully convolutional networks which reduced the time from 90 minutes to 40 seconds. Despite some present noise and label errors in the 490 full brain MRI scans, the final network achieves a true positive rate of 82.8% and 0.05 misclassifications per slice where all lesions greater than 3 mm have a perfect detection score. This work indicates that ConvNets are a suitable approach to both detect and segment metastases, especially as further architectural extensions might improve the predictive performance even more.
|
165 |
Curriculum Learning with Deep Convolutional Neural NetworksAvramova, Vanya January 2015 (has links)
Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.
|
166 |
Deep learning for measuring radon on plastic filmsLöfgren, Max January 2021 (has links)
Exposure to high levels of radon can be very harmful and lead to serious health issues. Measuring radon in buildings and houses is an important measure to prevent this. One way of measuring radon is to place out plastic films to be exposed over a period and then analyze images of them. Image processing together with deep learning has become very useful for image recognition and analysis. Training artificial neural networks using huge amount of data to learn to classify and predict new data is a widely used approach. In this project, artificial neural networks were trained to be able to predict the radon measurement of exposed plastic films. The data used was microscopic images of these films that first was modified to fit the training better and then sorted into two datasets. The datasets were divided into 10 classes with measurement values in intervals of 100 up to 1000. Two main types of neural networks were used in different shapes and with different training parameters: Convolutional neural networks and Dense neural networks. The convolutional model was able to predict new data with a 70 percent accuracy and the performance increased with a bigger image size (more pixels) but not with a deeper network architecture. Over 90 percent of the wrong data predictions also belonged to a class in the interval just above or below the predicted result which shows that the network has potential for improvements. The dense model only had a 35 percent accuracy but had a training accuracy of over 90 percent. This is because the model was heavily overfitted. A way to get better results could be to increase the dataset that was used with more images.
|
167 |
AI-Driven Image Manipulation : Image Outpainting Applied on Fashion ImagesMennborg, Alexander January 2021 (has links)
The e-commerce industry frequently has to deal with displaying product images in a website where the images are provided by the selling partners. The images in question can have drastically different aspect ratios and resolutions which makes it harder to present them while maintaining a coherent user experience. Manipulating images by cropping can sometimes result in parts of the foreground (i.e. product or person within the image) to be cut off. Image outpainting is a technique that allows images to be extended past its boundaries and can be used to alter the aspect ratio of images. Together with object detection for locating the foreground makes it possible to manipulate images without sacrificing parts of the foreground. For image outpainting a deep learning model was trained on product images that can extend images by at least 25%. The model achieves 8.29 FID score, 44.29 PSNR score and 39.95 BRISQUE score. For testing this solution in practice a simple image manipulation pipeline was created which uses image outpainting when needed and it shows promising results. Images can be manipulated in under a second running on ZOTAC GeForce RTX 3060 (12GB) GPU and a few seconds running on a Intel Core i7-8700K (16GB) CPU. There is also a special case of images where the background has been digitally replaced with a solid color and they can be outpainted even faster without deep learning.
|
168 |
Aplicación de redes neuronales convolucionales para la emulación del modelo psicoacústico MPEG-1, capa I, para la codificación de señales de audio / Convolutional neural networks applied to the emulation of the psychoacoustic model for MPEG-1, Layer I audio signal encodersSanchez Huapaya, Alonso Sebastián, Serpa Pinillos, Sergio André 26 August 2020 (has links)
Solicitud de envío manuscrito de artículo científico. / El presente trabajo propone 4 alternativas de codificadores inspirados en el codificador MPEG-1, capa I, descrito en el estándar ISO/IEC 11172-3. El problema que se intenta resolver es el de requerir definir un modelo psicoacústico explícitamente para lograr codificar audio, reemplazándolo por redes neuronales. Todas las alternativas de codificador están basadas en redes neuronales convolucionales multiescala (MCNN) que emulan el modelo psicoacústico 1 del codificador mencionado. Las redes tienen 32 entradas que corresponden a las 32 subbandas del nivel de presión sonora (SPL – sound pressure level), y una única salida que corresponde a una de las 32 subbandas de o bien la relación señal a máscara (SMR) o bien el vector de asignación de bits. Es decir, un codificador está compuesto de un conjunto de 32 redes neuronales. La validación empleó los 10 primeros segundos de 15 canciones elegidas aleatoriamente de 10 géneros musicales distintos. Se comparó la calidad de las señales de audio generadas por cada codificador contra la de MPEG-1, capa I, mediante la métrica de ODG. El codificador cuya entrada es el SPL y cuya salida es la SMR, planteado por Guillermo Kemper, obtuvo los mejores resultados al realizar la comparación para 96 kbps y 192 kbps. El codificador denominado “SBU1” obtuvo los mejores resultados para 128 kbps. / The present work proposes 4 encoder alternatives, inspired in the MPEG-1, layer I encoder described in the ISO/IEC 11172-3 standard. The problem addressed here is the requirement of explicitly defining a psychoacoustic model to code audio, instead replacing it by neural networks. All the proposals are based on multiscale convolutional neural networks (MCNN) that emulate the psychoacoustic model 1 of the referred encoder. The networks have 32 inputs that map the 32 subbands of the sound pressure level (SPL), and a single output that corresponds to each of the 32 subbands of either the signal-to-mask ratio (SMR) or the bit allocation vector. Thus, an encoder is composed of a set of 32 neural networks. The validation process took the first 10 seconds of 15 randomly chosen songs of 10 different musical genres. The audio signal quality of the proposed encoders was compared to that of the MPEG-1, layer I encoder, using the ODG metric. The encoder whose input is the SPL and whose output is the SMR, proposed by Guillermo Kemper, yielded the best results for 96 kbps and 192 kbps. The encoder named “SBU1” had the best results for 128 kbps. / Tesis
|
169 |
Predikce rychlosti a absolutni rychlosti pohybu z lidských intrakraniálních EEG dat pomocí hlubokých neuronových sítí. / Predikce rychlosti a absolutni rychlosti pohybu z lidských intrakraniálních EEG dat pomocí hlubokých neuronových sítí.Vystrčilová, Michaela January 2021 (has links)
Our brain controls the processes of the body including movement. In this thesis, we try to understand how the information about hand movement is encoded into the brain's electrical activity and how this activity can be used to predict the velocity and absolute velocity of hand movements. Using a well-established deep neural network architecture for EEG decoding - the Deep4Net - we predict hand movement velocity and absolute velocity from intracranial EEG signals. While reaching the expected performance level, we determine the influence of different frequency bands on the network's prediction. We find that modulations in the high-gamma frequency band are less informative than expected based on previous studies. We also identify two architectural modifications which lead to higher performances. 1. the removal of max-pooling layers in the architecture leads to significantly higher correlations. 2. the non-uniform receptive field of the network is a potential drawback making the network biased towards less relevant information. 1
|
170 |
Deep Learning for 3D Perception: Computer Vision and Tactile SensingGarcia-Garcia, Alberto 23 October 2019 (has links)
The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. To achieve full scene understanding, various tasks must be accomplished. In this thesis we will focus on three of them: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool for solving such recognition, segmentation, and prediction problems. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability.
|
Page generated in 0.1291 seconds