Spelling suggestions: "subject:"convolutional beural betworks."" "subject:"convolutional beural conetworks.""
191 |
FACIAL EXPRESSION ANALYSIS USING DEEP LEARNING WITH PARTIAL INTEGRATION TO OTHER MODALITIES TO DETECT EMOTIONGhayoumi, Mehdi 01 August 2017 (has links)
No description available.
|
192 |
A Study of Random Partitions vs. Patient-Based Partitions in Breast Cancer Tumor Detection using Convolutional Neural NetworksRamos, Joshua N 01 March 2024 (has links) (PDF)
Breast cancer is one of the deadliest cancers for women. In the US, 1 in 8 women will be diagnosed with breast cancer within their lifetimes. Detection and diagnosis play an important role in saving lives. To this end, many classifiers with varying structures have been designed to classify breast cancer histopathological images. However, randomly partitioning data, like many previous works have done, can lead to artificially inflated accuracies and classifiers that do not generalize. Data leakage occurs when researchers assume that every image in a dataset is independent of each other, which is often not the case for medical datasets, where multiple images are taken of each patient. This work focuses on convolutional neural network binary classifiers using the BreakHis dataset. Previous works are reviewed. Classifiers from previous literature are tested with patient partitioning, where individual patients are placed in the training, testing and validation sets so that there is no overlap. A classifier which previously achieved 93% accuracy consistently, only achieved 79% accuracy with the new patient partition. Robust data augmentation, a Sigmoid output layer and a different form of min-max normalization were utilized to achieve an accuracy of 89.38%. These improvements were shown to be effective with the architectures used. Sigmoid Model 1.1 is shown to perform well compared to much deeper architectures found in literature.
|
193 |
A Deep Learning Approach to Detection and Classification of Small Defects on Painted Surfaces : A Study Made on Volvo GTO, UmeåRönnqvist, Johannes, Sjölund, Johannes January 2019 (has links)
In this thesis we conclude that convolutional neural networks, together with phase-measuring deflectometry techniques, can be used to create models which can detect and classify defects on painted surfaces very well, even compared to experienced humans. Further, we show which preprocessing measures enhances the performance of the models. We see that standardisation does increase the classification accuracy of the models. We demonstrate that cleaning the data through relabelling and removing faulty images improves classification accuracy and especially the models' ability to distinguish between different types of defects. We show that oversampling might be a feasible method to improve accuracy through increasing and balancing the data set by augmenting existing observations. Lastly, we find that combining many images with different patterns heavily increases the classification accuracy of the models. Our proposed approach is demonstrated to work well in a real-time factory environment. An automated quality control of the painted surfaces of Volvo Truck cabins could give great benefits in cost and quality. The automated quality control could provide data for a root-cause analysis and a quick and efficient alarm system. This could significantly streamline production and at the same time reduce costs and errors in production. Corrections and optimisation of the processes could be made in earlier stages in time and with higher precision than today. / I den här rapporten visar vi att modeller av typen convolutional neural networks, tillsammans med phase-measuring deflektometri, kan hitta och klassificera defekter på målade ytor med hög precision, även jämfört med erfarna operatörer. Vidare visar vi vilka databehandlingsåtgärder som ökar modellernas prestanda. Vi ser att standardisering ökar modellernas klassificeringsförmåga. Vi visar att städning av data genom ommärkning och borttagning av felaktiga bilder förbättrar klassificeringsförmågan och särskilt modellernas förmåga att särskilja mellan olika typer av defekter. Vi visar att översampling kan vara en metod för att förbättra precisionen genom att öka och balansera datamängden genom att förändra och duplicera befintliga observationer. Slutligen finner vi att kombinera flera bilder med olika mönster ökar modellernas klassificeringsförmåga väsentligt. Vårt föreslagna tillvägagångssätt har visat sig fungera bra i realtid inom en produktionsmiljö. En automatiserad kvalitetskontroll av de målade ytorna på Volvos lastbilshytter kan ge stora fördelar med avseende på kostnad och kvalitet. Den automatiska kvalitetskontrollen kan ge data för en rotorsaksanalys och ett snabbt och effektivt alarmsystem. Detta kan väsentligt effektivisera produktionen och samtidigt minska kostnader och fel i produktionen. Korrigeringar och optimering av processerna kan göras i tidigare skeden och med högre precision än idag.
|
194 |
Image Based Oak Species Classification Using Deep Learning ApproachShiferaw, Adisalem Hadush, Keklik, Alican January 2024 (has links)
Real-time, minimal human intervention, and scalable classification of oak species, specifically Quercus petraea and Quercus robur, are crucial for forest management, biodiversity conservation, and ecological monitoring. Traditional methods are labor-intensive and costly, motivating the exploration of automated solutions. This study addresses the research problem of developing an efficient and scalable classification system using deep learning techniques. We developed a Convolutional Neural Network (CNN) from scratch and enhanced its performance with segmentation, fusion, and data augmentation techniques. Using a dataset of 649 oak leaf images, our model achieved a classification accuracy of 69.30% with a standard deviation of 2.48% and demonstrated efficient real-time application with an average processing time of 25.53 milliseconds per image. These results demonstrate the potential of deep learning to automate and improve the two oak species identification. This research provides a valuable tool for ecological studies and conservation efforts.
|
195 |
Deep neural networks and their application for image data processing / Deep neural networks and their application for image data processingGolovizin, Andrey January 2016 (has links)
In the area of image recognition, the so-called deep neural networks belong to the most promising models these days. They often achieve considerably better results than traditional techniques even without the necessity of any excessive task-oriented preprocessing. This thesis is devoted to the study and analysis of three basic variants of deep neural networks-namely the neocognitron, convolutional neural networks, and deep belief networks. Based on extensive testing of the described models on the standard task of handwritten digit recognition, the convolutional neural networks seem to be most suitable for the recognition of general image data. Therefore, we have used them also to classify images from two very large data sets-CIFAR-10 and ImageNet. In order to optimize the architecture of the applied networks, we have proposed a new pruning algorithm based on the Principal Component Analysis. Powered by TCPDF (www.tcpdf.org)
|
196 |
Método para execução de redes neurais convolucionais em FPGA. / A method for execution of convolutional neural networks in FPGA.Sousa, Mark Cappello Ferreira de 26 April 2019 (has links)
Redes Neurais Convolucionais têm sido utilizadas com sucesso para reconhecimento de padrões em imagens. Porém, o seu alto custo computacional e a grande quantidade de parâmetros envolvidos dificultam a execução em tempo real deste tipo de rede neural artificial em aplicações embarcadas, onde o poder de processamento e a capacidade de armazenamento de dados são restritos. Este trabalho estudou e desenvolveu um método para execução em tempo real em FPGAs de uma Rede Neural Convolucional treinada, aproveitando o poder de processamento paralelo deste tipo de dispositivo. O foco deste trabalho consistiu na execução das camadas convolucionais, pois estas camadas podem contribuir com até 99% da carga computacional de toda a rede. Nos experimentos, um dispositivo FPGA foi utilizado conjugado com um processador ARM dual-core em um mesmo substrato de silício. Apenas o dispositivo FPGA foi utilizado para executar as camadas convolucionais da Rede Neural Convolucional AlexNet. O método estudado neste trabalho foca na distribuição eficiente dos recursos do FPGA por meio do balanceamento do pipeline formado pelas camadas convolucionais, uso de buffers para redução e reutilização de memória para armazenamento dos dados intermediários (gerados e consumidos pelas camadas convolucionais) e uso de precisão numérica de 8 bits para armazenamento dos kernels e aumento da vazão de leitura dos mesmos. Com o método desenvolvido, foi possível executar todas as cinco camadas convolucionais da AlexNet em 3,9 ms, com a frequência máxima de operação de 76,9 MHz. Também foi possível armazenar todos os parâmetros das camadas convolucionais na memória interna do FPGA, eliminando possíveis gargalos de acesso à memória externa. / Convolutional Neural Networks have been used successfully for pattern recognition in images. However, their high computational cost and the large number of parameters involved make it difficult to perform this type of artificial neural network in real time in embedded applications, where the processing power and the data storage capacity are restricted. This work studied and developed methods for real-time execution in FPGAs of a trained convolutional neural network, taking advantage of the parallel processing power of this type of device. The focus of this work was the execution of convolutional layers, since these layers can contribute up to 99% of the computational load of the entire network. In the experiments, an FPGA device was used in conjunction with a dual-core ARM processor on the same silicon substrate. The FPGA was used to perform convolutional layers of the AlexNet Convolutional Neural Network. The methods studied in this work focus on the efficient distribution of the FPGA resources through the balancing of the pipeline formed by the convolutional layers, the use of buffers for the reduction and reuse of memory for the storage of intermediate data (generated and consumed by the convolutional layers) and 8 bits for storage of the kernels and increase of the flow of reading of them. With the developed methods, it was possible to execute all five AlexNet convolutional layers in 3.9 ms with the maximum operating frequency of 76.9 MHz. It was also possible to store all the parameters of the convolutional layers in the internal memory of the FPGA, eliminating possible external access memory bottlenecks.
|
197 |
Object Tracking Achieved by Implementing Predictive Methods with Static Object Detectors Trained on the Single Shot Detector Inception V2 Network / Objektdetektering Uppnådd genom Implementering av Prediktiva Metoder med Statiska Objektdetektorer Tränade på Entagningsdetektor Inception V2 NätverketBarkman, Richard Dan William January 2019 (has links)
In this work, the possibility of realising object tracking by implementing predictive methods with static object detectors is explored. The static object detectors are obtained as models trained on a machine learning algorithm, or in other words, a deep neural network. Specifically, it is the single shot detector inception v2 network that will be used to train such models. Predictive methods will be incorporated to the end of improving the obtained models’ precision, i.e. their performance with respect to accuracy. Namely, Lagrangian mechanics will be employed to derived equations of motion for three different scenarios in which the object is to be tracked. These equations of motion will be implemented as predictive methods by discretising and combining them with four different iterative formulae. In ch. 1, the fundamentals of supervised machine learning, neural networks, convolutional neural networks as well as the workings of the single shot detector algorithm, approaches to hyperparameter optimisation and other relevant theory is established. This includes derivations of the relevant equations of motion and the iterative formulae with which they were implemented. In ch. 2, the experimental set-up that was utilised during data collection, and the manner by which the acquired data was used to produce training, validation and test datasets is described. This is followed by a description of how the approach of random search was used to train 64 models on 300×300 datasets, and 32 models on 512×512 datasets. Consecutively, these models are evaluated based on their performance with respect to camera-to-object distance and object velocity. In ch. 3, the trained models were verified to possess multi-scale detection capabilities, as is characteristic of models trained on the single shot detector network. While the former is found to be true irrespective of the resolution-setting of the dataset that the model has been trained on, it is found that the performance with respect to varying object velocity is significantly more consistent for the lower resolution models as they operate at a higher detection rate. Ch. 3 continues with that the implemented predictive methods are evaluated. This is done by comparing the resulting deviations when they are let to predict the missing data points from a collected detection pattern, with varying sampling percentages. It is found that the best predictive methods are those that make use of the least amount of previous data points. This followed from that the data upon which evaluations were made contained an unreasonable amount of noise, considering that the iterative formulae implemented do not take noise into account. Moreover, the lower resolution models were found to benefit more than those trained on the higher resolution datasets because of the higher detection frequency they can employ. In ch. 4, it is argued that the concept of combining predictive methods with static object detectors to the end of obtaining an object tracker is promising. Moreover, the models obtained on the single shot detector network are concluded to be good candidates for such applications. However, the predictive methods studied in this thesis should be replaced with some method that can account for noise, or be extended to be able to account for it. A profound finding is that the single shot detector inception v2 models trained on a low-resolution dataset were found to outperform those trained on a high-resolution dataset in certain regards due to the higher detection rate possible on lower resolution frames. Namely, in performance with respect to object velocity and in that predictive methods performed better on the low-resolution models. / I detta arbete undersöks möjligheten att åstadkomma objektefterföljning genom att implementera prediktiva metoder med statiska objektdetektorer. De statiska objektdetektorerna erhålls som modeller tränade på en maskininlärnings-algoritm, det vill säga djupa neurala nätverk. Specifikt så är det en modifierad version av entagningsdetektor-nätverket, så kallat entagningsdetektor inception v2 nätverket, som används för att träna modellerna. Prediktiva metoder inkorporeras sedan för att förbättra modellernas förmåga att kunna finna ett eftersökt objekt. Nämligen används Lagrangiansk mekanik för härleda rörelseekvationer för vissa scenarion i vilka objektet är tänkt att efterföljas. Rörelseekvationerna implementeras genom att låta diskretisera dem och därefter kombinera dem med fyra olika iterationsformler. I kap. 2 behandlas grundläggande teori för övervakad maskininlärning, neurala nätverk, faltande neurala nätverk men också de grundläggande principer för entagningsdetektor-nätverket, närmanden till hyperparameter-optimering och övrig relevant teori. Detta inkluderar härledningar av rörelseekvationerna och de iterationsformler som de skall kombineras med. I kap. 3 så redogörs för den experimentella uppställning som användes vid datainsamling samt hur denna data användes för att producera olika data set. Därefter följer en skildring av hur random search kunde användas för att träna 64 modeller på data av upplösning 300×300 och 32 modeller på data av upplösning 512×512. Vidare utvärderades modellerna med avseende på deras prestanda för varierande kamera-till-objekt avstånd och objekthastighet. I kap. 4 så verifieras det att modellerna har en förmåga att detektera på flera skalor, vilket är ett karaktäristiskt drag för modeller tränade på entagninsdetektor-nätverk. Medan detta gällde för de tränade modellerna oavsett vilken upplösning av data de blivit tränade på, så fanns detekteringsprestandan med avseende på objekthastighet vara betydligt mer konsekvent för modellerna som tränats på data av lägre upplösning. Detta resulterade av att dessa modeller kan arbeta med en högre detekteringsfrekvens. Kap. 4 fortsätter med att de prediktiva metoderna utvärderas, vilket de kunde göras genom att jämföra den resulterande avvikelsen de respektive metoderna innebar då de läts arbeta på ett samplat detektionsmönster, sparat från då en tränad modell körts. I och med denna utvärdering så testades modellerna för olika samplingsgrader. Det visade sig att de bästa iterationsformlerna var de som byggde på färre tidigare datapunkter. Anledningen för detta är att den insamlade data, som testerna utfördes på, innehöll en avsevärd mängd brus. Med tanke på att de implementerade iterationsformlerna inte tar hänsyn till brus, så fick detta avgörande konsekvenser. Det fanns även att alla prediktiva metoder förbättrade objektdetekteringsförmågan till en högre utsträckning för modellerna som var tränade på data av lägre upplösning, vilket följer från att de kan arbeta med en högre detekteringsfrekvens. I kap. 5, argumenteras det, bland annat, för att konceptet att kombinera prediktiva metoder med statiska objektdetektorer för att åstadkomma objektefterföljning är lovande. Det slutleds även att modeller som erhålls från entagningsdetektor-nätverket är lovande kandidater för detta applikationsområde, till följd av deras höga detekteringsfrekvenser och förmåga att kunna detektera på flera skalor. Metoderna som användes för att förutsäga det efterföljda föremålets position fanns vara odugliga på grund av deras oförmåga att kunna hantera brus. Det slutleddes därmed att dessa antingen bör utökas till att kunna hantera brus eller ersättas av lämpligare metoder. Den mest väsentliga slutsats detta arbete presenterar är att lågupplösta entagninsdetektormodeller utgör bättre kandidater än de tränade på data av högre upplösning till följd av den ökade detekteringsfrekvens de erbjuder.
|
198 |
TDNet : A Generative Model for Taxi Demand Prediction / TDNet : En Generativ Modell för att Prediktera TaxiefterfråganSvensk, Gustav January 2019 (has links)
Supplying the right amount of taxis in the right place at the right time is very important for taxi companies. In this paper, the machine learning model Taxi Demand Net (TDNet) is presented which predicts short-term taxi demand in different zones of a city. It is based on WaveNet which is a causal dilated convolutional neural net for time-series generation. TDNet uses historical demand from the last years and transforms features such as time of day, day of week and day of month into 26-hour taxi demand forecasts for all zones in a city. It has been applied to one city in northern Europe and one in South America. In northern europe, an error of one taxi or less per hour per zone was achieved in 64% of the cases, in South America the number was 40%. In both cities, it beat the SARIMA and stacked ensemble benchmarks. This performance has been achieved by tuning the hyperparameters with a Bayesian optimization algorithm. Additionally, weather and holiday features were added as input features in the northern European city and they did not improve the accuracy of TDNet.
|
199 |
Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole / Performance prediction of Automatic Speech Recognition systemsElloumi, Zied 18 March 2019 (has links)
Nous abordons dans cette thèse la tâche de prédiction de performances des systèmes de reconnaissance automatique de la parole (SRAP).Il s'agit d'une tâche utile pour mesurer la fiabilité d'hypothèses de transcription issues d'une nouvelle collection de données, lorsque la transcription de référence est indisponible et que le SRAP utilisé est inconnu (boîte noire).Notre contribution porte sur plusieurs axes:d'abord, nous proposons un corpus français hétérogène pour apprendre et évaluer des systèmes de prédiction de performances ainsi que des systèmes de RAP.Nous comparons par la suite deux approches de prédiction: une approche à l'état de l'art basée sur l'extraction explicite de traitset une nouvelle approche basée sur des caractéristiques entraînées implicitement à l'aide des réseaux neuronaux convolutifs (CNN).L'utilisation jointe de traits textuels et acoustiques n'apporte pas de gains avec de l'approche état de l'art,tandis qu'elle permet d'obtenir de meilleures prédictions en utilisant les CNNs. Nous montrons également que les CNNs prédisent clairement la distribution des taux d'erreurs sur une collection d'enregistrements, contrairement à l'approche état de l'art qui génère une distribution éloignée de la réalité.Ensuite, nous analysons des facteurs impactant les deux approches de prédiction. Nous évaluons également l'impact de la quantité d'apprentissage des systèmes de prédiction ainsi que la robustesse des systèmes appris avec les sorties d'un système de RAP particulier et utilisés pour prédire la performance sur une nouvelle collection de données.Nos résultats expérimentaux montrent que les deux approches de prédiction sont robustes et que la tâche de prédiction est plus difficile sur des tours de parole courts ainsi que sur les tours de parole ayant un style de parole spontané.Enfin, nous essayons de comprendre quelles informations sont capturées par notre modèle neuronal et leurs liens avec différents facteurs.Nos expériences montrent que les représentations intermédiaires dans le réseau encodent implicitementdes informations sur le style de la parole, l'accent du locuteur ainsi que le type d'émission.Pour tirer profit de cette analyse, nous proposons un système multi-tâche qui se montre légèrement plus efficace sur la tâche de prédiction de performance. / In this thesis, we focus on performance prediction of automatic speech recognition (ASR) systems.This is a very useful task to measure the reliability of transcription hypotheses for a new data collection, when the reference transcription is unavailable and the ASR system used is unknown (black box).Our contribution focuses on several areas: first, we propose a heterogeneous French corpus to learn and evaluate ASR prediction systems.We then compare two prediction approaches: a state-of-the-art (SOTA) performance prediction based on engineered features and a new strategy based on learnt features using convolutional neural networks (CNNs).While the joint use of textual and signal features did not work for the SOTA system, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the shape of the WER distribution on a collection of speech recordings.Then, we analyze factors impacting both prediction approaches. We also assess the impact of the training size of prediction systems as well as the robustness of systems learned with the outputs of a particular ASR system and used to predict performance on a new data collection.Our experimental results show that both prediction approaches are robust and that the prediction task is more difficult on short speech turns as well as spontaneous speech style.Finally, we try to understand which information is captured by our neural model and its relation with different factors.Our experiences show that intermediate representations in the network automatically encode information on the speech style, the speaker's accent as well as the broadcast program type.To take advantage of this analysis, we propose a multi-task system that is slightly more effective on the performance prediction task.
|
200 |
Super-Resolution for Fast Multi-Contrast Magnetic Resonance ImagingNilsson, Erik January 2019 (has links)
There are many clinical situations where magnetic resonance imaging (MRI) is preferable over other imaging modalities, while the major disadvantage is the relatively long scan time. Due to limited resources, this means that not all patients can be offered an MRI scan, even though it could provide crucial information. It can even be deemed unsafe for a critically ill patient to undergo the examination. In MRI, there is a trade-off between resolution, signal-to-noise ratio (SNR) and the time spent gathering data. When time is of utmost importance, we seek other methods to increase the resolution while preserving SNR and imaging time. In this work, I have studied one of the most promising methods for this task. Namely, constructing super-resolution algorithms to learn the mapping from a low resolution image to a high resolution image using convolutional neural networks. More specifically, I constructed networks capable of transferring high frequency (HF) content, responsible for details in an image, from one kind of image to another. In this context, contrast or weight is used to describe what kind of image we look at. This work only explores the possibility of transferring HF content from T1-weighted images, which can be obtained quite quickly, to T2-weighted images, which would take much longer for similar quality. By doing so, the hope is to contribute to increased efficacy of MRI, and reduce the problems associated with the long scan times. At first, a relatively simple network was implemented to show that transferring HF content between contrasts is possible, as a proof of concept. Next, a much more complex network was proposed, to successfully increase the resolution of MR images better than the commonly used bicubic interpolation method. This is a conclusion drawn from a test where 12 participants were asked to rate the two methods (p=0.0016) Both visual comparisons and quality measures, such as PSNR and SSIM, indicate that the proposed network outperforms a similar network that only utilizes images of one contrast. This suggests that HF content was successfully transferred between images of different contrasts, which improves the reconstruction process. Thus, it could be argued that the proposed multi-contrast model could decrease scan time even further than what its single-contrast counterpart would. Hence, this way of performing multi-contrast super-resolution has the potential to increase the efficacy of MRI.
|
Page generated in 0.0675 seconds