Global ETD Search

191	A Study of Random Partitions vs. Patient-Based Partitions in Breast Cancer Tumor Detection using Convolutional Neural Networks Ramos, Joshua N 01 March 2024 (has links) (PDF) Breast cancer is one of the deadliest cancers for women. In the US, 1 in 8 women will be diagnosed with breast cancer within their lifetimes. Detection and diagnosis play an important role in saving lives. To this end, many classifiers with varying structures have been designed to classify breast cancer histopathological images. However, randomly partitioning data, like many previous works have done, can lead to artificially inflated accuracies and classifiers that do not generalize. Data leakage occurs when researchers assume that every image in a dataset is independent of each other, which is often not the case for medical datasets, where multiple images are taken of each patient. This work focuses on convolutional neural network binary classifiers using the BreakHis dataset. Previous works are reviewed. Classifiers from previous literature are tested with patient partitioning, where individual patients are placed in the training, testing and validation sets so that there is no overlap. A classifier which previously achieved 93% accuracy consistently, only achieved 79% accuracy with the new patient partition. Robust data augmentation, a Sigmoid output layer and a different form of min-max normalization were utilized to achieve an accuracy of 89.38%. These improvements were shown to be effective with the architectures used. Sigmoid Model 1.1 is shown to perform well compared to much deeper architectures found in literature. Data Leakage Medical Datasets Image Classification Convolutional Neural Networks Patient Partitioning Data Partitioning Other Computer Engineering
192	Image Based Oak Species Classification Using Deep Learning Approach Shiferaw, Adisalem Hadush, Keklik, Alican January 2024 (has links) Real-time, minimal human intervention, and scalable classification of oak species, specifically Quercus petraea and Quercus robur, are crucial for forest management, biodiversity conservation, and ecological monitoring. Traditional methods are labor-intensive and costly, motivating the exploration of automated solutions. This study addresses the research problem of developing an efficient and scalable classification system using deep learning techniques. We developed a Convolutional Neural Network (CNN) from scratch and enhanced its performance with segmentation, fusion, and data augmentation techniques. Using a dataset of 649 oak leaf images, our model achieved a classification accuracy of 69.30% with a standard deviation of 2.48% and demonstrated efficient real-time application with an average processing time of 25.53 milliseconds per image. These results demonstrate the potential of deep learning to automate and improve the two oak species identification. This research provides a valuable tool for ecological studies and conservation efforts. deep learning oak species classification convolutional neural networks image processing forest management biodiversity conservation. Computer Sciences Datavetenskap (datalogi)
193	Deep neural networks and their application for image data processing / Deep neural networks and their application for image data processing Golovizin, Andrey January 2016 (has links) In the area of image recognition, the so-called deep neural networks belong to the most promising models these days. They often achieve considerably better results than traditional techniques even without the necessity of any excessive task-oriented preprocessing. This thesis is devoted to the study and analysis of three basic variants of deep neural networks-namely the neocognitron, convolutional neural networks, and deep belief networks. Based on extensive testing of the described models on the standard task of handwritten digit recognition, the convolutional neural networks seem to be most suitable for the recognition of general image data. Therefore, we have used them also to classify images from two very large data sets-CIFAR-10 and ImageNet. In order to optimize the architecture of the applied networks, we have proposed a new pruning algorithm based on the Principal Component Analysis. Powered by TCPDF (www.tcpdf.org)
194	Método para execução de redes neurais convolucionais em FPGA. / A method for execution of convolutional neural networks in FPGA. Sousa, Mark Cappello Ferreira de 26 April 2019 (has links) Redes Neurais Convolucionais têm sido utilizadas com sucesso para reconhecimento de padrões em imagens. Porém, o seu alto custo computacional e a grande quantidade de parâmetros envolvidos dificultam a execução em tempo real deste tipo de rede neural artificial em aplicações embarcadas, onde o poder de processamento e a capacidade de armazenamento de dados são restritos. Este trabalho estudou e desenvolveu um método para execução em tempo real em FPGAs de uma Rede Neural Convolucional treinada, aproveitando o poder de processamento paralelo deste tipo de dispositivo. O foco deste trabalho consistiu na execução das camadas convolucionais, pois estas camadas podem contribuir com até 99% da carga computacional de toda a rede. Nos experimentos, um dispositivo FPGA foi utilizado conjugado com um processador ARM dual-core em um mesmo substrato de silício. Apenas o dispositivo FPGA foi utilizado para executar as camadas convolucionais da Rede Neural Convolucional AlexNet. O método estudado neste trabalho foca na distribuição eficiente dos recursos do FPGA por meio do balanceamento do pipeline formado pelas camadas convolucionais, uso de buffers para redução e reutilização de memória para armazenamento dos dados intermediários (gerados e consumidos pelas camadas convolucionais) e uso de precisão numérica de 8 bits para armazenamento dos kernels e aumento da vazão de leitura dos mesmos. Com o método desenvolvido, foi possível executar todas as cinco camadas convolucionais da AlexNet em 3,9 ms, com a frequência máxima de operação de 76,9 MHz. Também foi possível armazenar todos os parâmetros das camadas convolucionais na memória interna do FPGA, eliminando possíveis gargalos de acesso à memória externa. / Convolutional Neural Networks have been used successfully for pattern recognition in images. However, their high computational cost and the large number of parameters involved make it difficult to perform this type of artificial neural network in real time in embedded applications, where the processing power and the data storage capacity are restricted. This work studied and developed methods for real-time execution in FPGAs of a trained convolutional neural network, taking advantage of the parallel processing power of this type of device. The focus of this work was the execution of convolutional layers, since these layers can contribute up to 99% of the computational load of the entire network. In the experiments, an FPGA device was used in conjunction with a dual-core ARM processor on the same silicon substrate. The FPGA was used to perform convolutional layers of the AlexNet Convolutional Neural Network. The methods studied in this work focus on the efficient distribution of the FPGA resources through the balancing of the pipeline formed by the convolutional layers, the use of buffers for the reduction and reuse of memory for the storage of intermediate data (generated and consumed by the convolutional layers) and 8 bits for storage of the kernels and increase of the flow of reading of them. With the developed methods, it was possible to execute all five AlexNet convolutional layers in 3.9 ms with the maximum operating frequency of 76.9 MHz. It was also possible to store all the parameters of the convolutional layers in the internal memory of the FPGA, eliminating possible external access memory bottlenecks. AlexNet AlexNet Convolutional neural networks Embedded pattern recognition FPGA FPGA Image recognition Reconhecimento de imagem Reconhecimento embarcado de padrões Redes neurais Sistema-em-um-chip System-on-chip
195	Object Tracking Achieved by Implementing Predictive Methods with Static Object Detectors Trained on the Single Shot Detector Inception V2 Network / Objektdetektering Uppnådd genom Implementering av Prediktiva Metoder med Statiska Objektdetektorer Tränade på Entagningsdetektor Inception V2 Nätverket Barkman, Richard Dan William January 2019 (has links) In this work, the possibility of realising object tracking by implementing predictive methods with static object detectors is explored. The static object detectors are obtained as models trained on a machine learning algorithm, or in other words, a deep neural network. Specifically, it is the single shot detector inception v2 network that will be used to train such models. Predictive methods will be incorporated to the end of improving the obtained models’ precision, i.e. their performance with respect to accuracy. Namely, Lagrangian mechanics will be employed to derived equations of motion for three different scenarios in which the object is to be tracked. These equations of motion will be implemented as predictive methods by discretising and combining them with four different iterative formulae. In ch. 1, the fundamentals of supervised machine learning, neural networks, convolutional neural networks as well as the workings of the single shot detector algorithm, approaches to hyperparameter optimisation and other relevant theory is established. This includes derivations of the relevant equations of motion and the iterative formulae with which they were implemented. In ch. 2, the experimental set-up that was utilised during data collection, and the manner by which the acquired data was used to produce training, validation and test datasets is described. This is followed by a description of how the approach of random search was used to train 64 models on 300×300 datasets, and 32 models on 512×512 datasets. Consecutively, these models are evaluated based on their performance with respect to camera-to-object distance and object velocity. In ch. 3, the trained models were verified to possess multi-scale detection capabilities, as is characteristic of models trained on the single shot detector network. While the former is found to be true irrespective of the resolution-setting of the dataset that the model has been trained on, it is found that the performance with respect to varying object velocity is significantly more consistent for the lower resolution models as they operate at a higher detection rate. Ch. 3 continues with that the implemented predictive methods are evaluated. This is done by comparing the resulting deviations when they are let to predict the missing data points from a collected detection pattern, with varying sampling percentages. It is found that the best predictive methods are those that make use of the least amount of previous data points. This followed from that the data upon which evaluations were made contained an unreasonable amount of noise, considering that the iterative formulae implemented do not take noise into account. Moreover, the lower resolution models were found to benefit more than those trained on the higher resolution datasets because of the higher detection frequency they can employ. In ch. 4, it is argued that the concept of combining predictive methods with static object detectors to the end of obtaining an object tracker is promising. Moreover, the models obtained on the single shot detector network are concluded to be good candidates for such applications. However, the predictive methods studied in this thesis should be replaced with some method that can account for noise, or be extended to be able to account for it. A profound finding is that the single shot detector inception v2 models trained on a low-resolution dataset were found to outperform those trained on a high-resolution dataset in certain regards due to the higher detection rate possible on lower resolution frames. Namely, in performance with respect to object velocity and in that predictive methods performed better on the low-resolution models. / I detta arbete undersöks möjligheten att åstadkomma objektefterföljning genom att implementera prediktiva metoder med statiska objektdetektorer. De statiska objektdetektorerna erhålls som modeller tränade på en maskininlärnings-algoritm, det vill säga djupa neurala nätverk. Specifikt så är det en modifierad version av entagningsdetektor-nätverket, så kallat entagningsdetektor inception v2 nätverket, som används för att träna modellerna. Prediktiva metoder inkorporeras sedan för att förbättra modellernas förmåga att kunna finna ett eftersökt objekt. Nämligen används Lagrangiansk mekanik för härleda rörelseekvationer för vissa scenarion i vilka objektet är tänkt att efterföljas. Rörelseekvationerna implementeras genom att låta diskretisera dem och därefter kombinera dem med fyra olika iterationsformler. I kap. 2 behandlas grundläggande teori för övervakad maskininlärning, neurala nätverk, faltande neurala nätverk men också de grundläggande principer för entagningsdetektor-nätverket, närmanden till hyperparameter-optimering och övrig relevant teori. Detta inkluderar härledningar av rörelseekvationerna och de iterationsformler som de skall kombineras med. I kap. 3 så redogörs för den experimentella uppställning som användes vid datainsamling samt hur denna data användes för att producera olika data set. Därefter följer en skildring av hur random search kunde användas för att träna 64 modeller på data av upplösning 300×300 och 32 modeller på data av upplösning 512×512. Vidare utvärderades modellerna med avseende på deras prestanda för varierande kamera-till-objekt avstånd och objekthastighet. I kap. 4 så verifieras det att modellerna har en förmåga att detektera på flera skalor, vilket är ett karaktäristiskt drag för modeller tränade på entagninsdetektor-nätverk. Medan detta gällde för de tränade modellerna oavsett vilken upplösning av data de blivit tränade på, så fanns detekteringsprestandan med avseende på objekthastighet vara betydligt mer konsekvent för modellerna som tränats på data av lägre upplösning. Detta resulterade av att dessa modeller kan arbeta med en högre detekteringsfrekvens. Kap. 4 fortsätter med att de prediktiva metoderna utvärderas, vilket de kunde göras genom att jämföra den resulterande avvikelsen de respektive metoderna innebar då de läts arbeta på ett samplat detektionsmönster, sparat från då en tränad modell körts. I och med denna utvärdering så testades modellerna för olika samplingsgrader. Det visade sig att de bästa iterationsformlerna var de som byggde på färre tidigare datapunkter. Anledningen för detta är att den insamlade data, som testerna utfördes på, innehöll en avsevärd mängd brus. Med tanke på att de implementerade iterationsformlerna inte tar hänsyn till brus, så fick detta avgörande konsekvenser. Det fanns även att alla prediktiva metoder förbättrade objektdetekteringsförmågan till en högre utsträckning för modellerna som var tränade på data av lägre upplösning, vilket följer från att de kan arbeta med en högre detekteringsfrekvens. I kap. 5, argumenteras det, bland annat, för att konceptet att kombinera prediktiva metoder med statiska objektdetektorer för att åstadkomma objektefterföljning är lovande. Det slutleds även att modeller som erhålls från entagningsdetektor-nätverket är lovande kandidater för detta applikationsområde, till följd av deras höga detekteringsfrekvenser och förmåga att kunna detektera på flera skalor. Metoderna som användes för att förutsäga det efterföljda föremålets position fanns vara odugliga på grund av deras oförmåga att kunna hantera brus. Det slutleddes därmed att dessa antingen bör utökas till att kunna hantera brus eller ersättas av lämpligare metoder. Den mest väsentliga slutsats detta arbete presenterar är att lågupplösta entagninsdetektormodeller utgör bättre kandidater än de tränade på data av högre upplösning till följd av den ökade detekteringsfrekvens de erbjuder. Supervised Machine Learning Hyperparameter Optimisation Convolutional Neural Networks Lagrangian Mechanics Predictive Methods
196	TDNet : A Generative Model for Taxi Demand Prediction / TDNet : En Generativ Modell för att Prediktera Taxiefterfrågan Svensk, Gustav January 2019 (has links) Supplying the right amount of taxis in the right place at the right time is very important for taxi companies. In this paper, the machine learning model Taxi Demand Net (TDNet) is presented which predicts short-term taxi demand in different zones of a city. It is based on WaveNet which is a causal dilated convolutional neural net for time-series generation. TDNet uses historical demand from the last years and transforms features such as time of day, day of week and day of month into 26-hour taxi demand forecasts for all zones in a city. It has been applied to one city in northern Europe and one in South America. In northern europe, an error of one taxi or less per hour per zone was achieved in 64% of the cases, in South America the number was 40%. In both cities, it beat the SARIMA and stacked ensemble benchmarks. This performance has been achieved by tuning the hyperparameters with a Bayesian optimization algorithm. Additionally, weather and holiday features were added as input features in the northern European city and they did not improve the accuracy of TDNet. Taxi Demand TDNet WaveNet Tree Parzen Estimator Predictive Models Convolutional Neural Networks Time Series Forecasting Traffic Forecasting Sequence Learning Models Computer and Information Sciences Data- och informationsvetenskap
197	Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole / Performance prediction of Automatic Speech Recognition systems Elloumi, Zied 18 March 2019 (has links) Nous abordons dans cette thèse la tâche de prédiction de performances des systèmes de reconnaissance automatique de la parole (SRAP).Il s'agit d'une tâche utile pour mesurer la fiabilité d'hypothèses de transcription issues d'une nouvelle collection de données, lorsque la transcription de référence est indisponible et que le SRAP utilisé est inconnu (boîte noire).Notre contribution porte sur plusieurs axes:d'abord, nous proposons un corpus français hétérogène pour apprendre et évaluer des systèmes de prédiction de performances ainsi que des systèmes de RAP.Nous comparons par la suite deux approches de prédiction: une approche à l'état de l'art basée sur l'extraction explicite de traitset une nouvelle approche basée sur des caractéristiques entraînées implicitement à l'aide des réseaux neuronaux convolutifs (CNN).L'utilisation jointe de traits textuels et acoustiques n'apporte pas de gains avec de l'approche état de l'art,tandis qu'elle permet d'obtenir de meilleures prédictions en utilisant les CNNs. Nous montrons également que les CNNs prédisent clairement la distribution des taux d'erreurs sur une collection d'enregistrements, contrairement à l'approche état de l'art qui génère une distribution éloignée de la réalité.Ensuite, nous analysons des facteurs impactant les deux approches de prédiction. Nous évaluons également l'impact de la quantité d'apprentissage des systèmes de prédiction ainsi que la robustesse des systèmes appris avec les sorties d'un système de RAP particulier et utilisés pour prédire la performance sur une nouvelle collection de données.Nos résultats expérimentaux montrent que les deux approches de prédiction sont robustes et que la tâche de prédiction est plus difficile sur des tours de parole courts ainsi que sur les tours de parole ayant un style de parole spontané.Enfin, nous essayons de comprendre quelles informations sont capturées par notre modèle neuronal et leurs liens avec différents facteurs.Nos expériences montrent que les représentations intermédiaires dans le réseau encodent implicitementdes informations sur le style de la parole, l'accent du locuteur ainsi que le type d'émission.Pour tirer profit de cette analyse, nous proposons un système multi-tâche qui se montre légèrement plus efficace sur la tâche de prédiction de performance. / In this thesis, we focus on performance prediction of automatic speech recognition (ASR) systems.This is a very useful task to measure the reliability of transcription hypotheses for a new data collection, when the reference transcription is unavailable and the ASR system used is unknown (black box).Our contribution focuses on several areas: first, we propose a heterogeneous French corpus to learn and evaluate ASR prediction systems.We then compare two prediction approaches: a state-of-the-art (SOTA) performance prediction based on engineered features and a new strategy based on learnt features using convolutional neural networks (CNNs).While the joint use of textual and signal features did not work for the SOTA system, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the shape of the WER distribution on a collection of speech recordings.Then, we analyze factors impacting both prediction approaches. We also assess the impact of the training size of prediction systems as well as the robustness of systems learned with the outputs of a particular ASR system and used to predict performance on a new data collection.Our experimental results show that both prediction approaches are robust and that the prediction task is more difficult on short speech turns as well as spontaneous speech style.Finally, we try to understand which information is captured by our neural model and its relation with different factors.Our experiences show that intermediate representations in the network automatically encode information on the speech style, the speaker's accent as well as the broadcast program type.To take advantage of this analysis, we propose a multi-task system that is slightly more effective on the performance prediction task. Évaluation automatique Prédiction de performances Reconnaissance automatique de la parole Réseau neuronal convolutif Automatic evaluation Performance prediction Automatic speech recognition Convolutional neural networks 004
198	Super-Resolution for Fast Multi-Contrast Magnetic Resonance Imaging Nilsson, Erik January 2019 (has links) There are many clinical situations where magnetic resonance imaging (MRI) is preferable over other imaging modalities, while the major disadvantage is the relatively long scan time. Due to limited resources, this means that not all patients can be offered an MRI scan, even though it could provide crucial information. It can even be deemed unsafe for a critically ill patient to undergo the examination. In MRI, there is a trade-off between resolution, signal-to-noise ratio (SNR) and the time spent gathering data. When time is of utmost importance, we seek other methods to increase the resolution while preserving SNR and imaging time. In this work, I have studied one of the most promising methods for this task. Namely, constructing super-resolution algorithms to learn the mapping from a low resolution image to a high resolution image using convolutional neural networks. More specifically, I constructed networks capable of transferring high frequency (HF) content, responsible for details in an image, from one kind of image to another. In this context, contrast or weight is used to describe what kind of image we look at. This work only explores the possibility of transferring HF content from T1-weighted images, which can be obtained quite quickly, to T2-weighted images, which would take much longer for similar quality. By doing so, the hope is to contribute to increased efficacy of MRI, and reduce the problems associated with the long scan times. At first, a relatively simple network was implemented to show that transferring HF content between contrasts is possible, as a proof of concept. Next, a much more complex network was proposed, to successfully increase the resolution of MR images better than the commonly used bicubic interpolation method. This is a conclusion drawn from a test where 12 participants were asked to rate the two methods (p=0.0016) Both visual comparisons and quality measures, such as PSNR and SSIM, indicate that the proposed network outperforms a similar network that only utilizes images of one contrast. This suggests that HF content was successfully transferred between images of different contrasts, which improves the reconstruction process. Thus, it could be argued that the proposed multi-contrast model could decrease scan time even further than what its single-contrast counterpart would. Hence, this way of performing multi-contrast super-resolution has the potential to increase the efficacy of MRI. Deep learning convolutional neural networks CNN super-resolution MRI Other Physics Topics Annan fysik Medical Image Processing Medicinsk bildbehandling
199	Analisando a viabilidade de deep learning para reconhecimento de a??es em datasets pequenos Santos Junior, Juarez Monteiro dos 06 March 2018 (has links) Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-05-03T18:10:00Z No. of bitstreams: 1 JUAREZ_MONTEIRO_DIS.pdf: 4814365 bytes, checksum: 44d808dc5b6459f46854eb7cbd2b78a4 (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-05-15T11:13:48Z (GMT) No. of bitstreams: 1 JUAREZ_MONTEIRO_DIS.pdf: 4814365 bytes, checksum: 44d808dc5b6459f46854eb7cbd2b78a4 (MD5) / Made available in DSpace on 2018-05-15T11:30:05Z (GMT). No. of bitstreams: 1 JUAREZ_MONTEIRO_DIS.pdf: 4814365 bytes, checksum: 44d808dc5b6459f46854eb7cbd2b78a4 (MD5) Previous issue date: 2018-03-06 / Action recognition is the computer vision task of identifying which action is happening in a given sequence of frames. Traditional approaches rely on handcrafted features and domain specific algorithms, often resulting in limited accuracy. The substantial advances in deep learning and the availability of larger datasets have allowed techniques that yield better performance without domain-specific knowledge to recognize actions being performed based on the raw information from video sequences. However, deep learning algorithms usually require very large labeled datasets for training, and due to their increased capacity their often overfit small data, hence providing lower generalization power. This work aims to explore deep learning in the context of small-sized action recognition datasets. Our goal is to achieve significant performance even in cases in which labeled data is not abundant. In order to do so, we investigate distinct network architectures, data pre-processing, and fusion methods, providing guidelines and good practices for using deep learning in small-sized datasets. / Reconhecimento de a??o ? a tarefa de vis?o computacional que identifica qual a??o esta ocorrendo em dada sequ?ncia de frames. Abordagens tradicionais dependem de caracter?sticas extra?das dessas imagens e algoritmos espec?ficos de dom?nio, muitas vezes resultando em uma precis?o limitada. Os avan?os substanciais na aprendizagem profunda e a disponibilidade de conjuntos de dados maiores permitiram que t?cnicas produzam um desempenho sem conhecimento espec?fico do dom?nio para reconhecer as a??es que est?o sendo realizadas, tendo como base apenas sequ?ncias de v?deo. No entanto, os algoritmos de aprendizagem profunda geralmente requerem conjuntos de dados rotulados muito grandes para o treinamento. Devido ? sua maior capacidade, tais algoritmos geralmente sofrem com overfitting em conjunto de dados pequenos, proporcionando assim um menor poder de generaliza??o. Este trabalho tem como objetivo explorar a aprendizagem profunda no contexto de conjuntos de dados pequenos para reconhecimento de a??es. Nosso objetivo ? alcan?ar resultados, mesmo nos casos em que os dados rotulados n?o sejam abundantes. Para isso, investigamos diferentes arquiteturas profundas, diferentes m?todos de processamento, e diferentes m?todos de fus?o, fornecendo diretrizes e boas pr?ticas para o aprendizado profundo em conjuntos de dados de tamanho pequeno. Aprendizado de M?quina Redes Neurais Redes Neurais Convolucionais Reconhecimento de A??es Machine Learning Neural Networks Convolutional Neural Networks Action Recognition
200	Reconhecimento de imagens de marcas de gado utilizando redes neurais convolucionais e máquinas de vetores de suporte Santos, Carlos Alexandre Silva dos 26 September 2017 (has links) Submitted by Marlucy Farias Medeiros (marlucy.farias@unipampa.edu.br) on 2017-10-31T17:44:17Z No. of bitstreams: 1 Carlos_Alexandre Silva_dos Santos - 2017.pdf: 27850839 bytes, checksum: c4399fa8396d3b558becbfa67b7dd777 (MD5) / Approved for entry into archive by Marlucy Farias Medeiros (marlucy.farias@unipampa.edu.br) on 2017-10-31T18:24:21Z (GMT) No. of bitstreams: 1 Carlos_Alexandre Silva_dos Santos - 2017.pdf: 27850839 bytes, checksum: c4399fa8396d3b558becbfa67b7dd777 (MD5) / Made available in DSpace on 2017-10-31T18:24:21Z (GMT). No. of bitstreams: 1 Carlos_Alexandre Silva_dos Santos - 2017.pdf: 27850839 bytes, checksum: c4399fa8396d3b558becbfa67b7dd777 (MD5) Previous issue date: 2017-09-26 / O reconhecimento automático de imagens de marca de gado é uma necessidade para os órgãos governamentais responsáveis por esta atividade. Para auxiliar neste processo, este trabalho propõe uma arquitetura que seja capaz de realizar o reconhecimento automático dessas marcas. Nesse sentido, uma arquitetura foi implementada e experimentos foram realizados com dois métodos: Bag-of-Features e Redes Neurais Convolucionais (CNN). No método Bag-of-Features foi utilizado o algoritmo SURF para extração de pontos de interesse das imagens e para criação do agrupa mento de palavras visuais foi utilizado o clustering K-means. O método Bag-of-Features apresentou acurácia geral de 86,02% e tempo de processamento de 56,705 segundos para um conjunto de 12 marcas e 540 imagens. No método CNN foi criada uma rede completa com 5 camadas convolucionais e 3 camadas totalmente conectadas. A 1 ª camada convolucional teve como entrada imagens transformadas para o formato de cores RGB. Para ativação da CNN foi utilizada a função ReLU, e a técnica de maxpooling para redução. O método CNN apresentou acurácia geral de 93,28% e tempo de processamento de 12,716 segundos para um conjunto de 12 marcas e 540 imagens. O método CNN consiste de seis etapas: a) selecionar o banco de imagens; b) selecionar o modelo de CNN pré-treinado; c) pré-processar as imagens e aplicar a CNN; d) extrair as características das imagens; e) treinar e classificar as imagens utilizando SVM; f) avaliar os resultados da classificação. Os experimentos foram realizados utilizando o conjunto de imagens de marcas de gado de uma prefeitura municipal. Para avaliação do desempenho da arquitetura proposta foram utilizadas as métricas de acurácia geral, recall, precisão, coeficiente Kappa e tempo de processamento. Os resultados obtidos foram satisfatórios, nos quais o método CNN apresentou os melhores resultados em comparação ao método Bag-of-Features, sendo 7,26% mais preciso e 43,989 segundos mais rápido. Também foram realizados experimentos com o método CNN em conjuntos de marcas com número maior de amostras, o qual obteve taxas de acurácia geral de 94,90% para 12 marcas e 840 imagens, e 80,57% para 500 marcas e 22.500 imagens, respectivamente. / The automatic recognition of cattle branding is a necessity for government agencies responsible for this activity. In order to improve this process, this work proposes an architecture which is able of performing the automatic recognition of these brandings. The proposed software implements two methods, namely: Bag-of-Features and CNN. For the Bag-of-Features method, the SURF algorithm was used in order to extract points of interest from the images. We also used K-means clustering to create the visual word cluster. The Bag-of-Features method presented a overall accuracy of 86.02% and a processing time of 56.705 seconds in a set containing 12 brandings and 540 images. For the CNN method, we created a complete network with five convolutional layers, and three layers fully connected. For the 1st convolutional layer we converted the input images into the RGB color for mat. In order to activate the CNN, we performed an application of the ReLU, and used the maxpooling technique for the reduction. The CNN method presented 93.28% of overall accuracy and a processing time of 12.716 seconds for a set containing 12 brandings and 540 images. The CNN method includes six steps: a) selecting the image database; b) selecting the pre-trained CNN model; c) pre-processing the images and applying the CNN; d) extracting the features from the images; e) training and classifying the images using SVM; f) assessing the classification results. The experiments were performed using the cattle branding image set of a City Hall. Metrics of overall accuracy, recall, precision, Kappa coefficient, and processing time were used in order to assess the performance of the proposed architecture. Results were satisfactory. The CNN method showed the best results when compared to Bag-of-Features method, considering that it was 7.26% more accurate and 43.989 seconds faster. Also, some experiments were conducted with the CNN method for sets of brandings with a greater number of samples. These larger sets presented a overall accuracy rate of 94.90% for 12 brandings and 840 images, and 80.57% for 500 brandings and 22,500 images, respectively. CNPQ::ENGENHARIAS Aprendizagem profunda Redes neurais convolucionais Máquinas de vetores de suporte Reconhecimento de imagens Marcas de gado Engenharia elétrica Deep learning Convolutional neural networks Support vector machines Image recognition Cattle branding

Search results