Spelling suggestions: "subject:"convolutional beural betworks."" "subject:"convolutional beural conetworks.""
271 |
Efficient Adaptation of Deep Vision ModelsZe Wang (15354715) 27 April 2023 (has links)
<p>Deep neural networks have made significant advances in computer vision. However, several challenges limit their real-world applications. For example, domain shifts in vision data degrade model performance; visual appearance variances affect model robustness; it is also non-trivial to extend a model trained on one task to novel tasks; and in many applications, large-scale labeled data are not even available for learning powerful deep models from scratch. This research focuses on improving the transferability of deep features and the efficiency of deep vision model adaptation, leading to enhanced generalization and new capabilities on computer vision tasks. Specifically, we approach these problems from the following two directions: architectural adaptation and label-efficient transferable feature learning. From an architectural perspective, we investigate various schemes that permit network adaptation to be parametrized by multiple copies of sub-structures, distributions of parameter subspaces, or functions that infer parameters from data. We also explore how model adaptation can bring new capabilities, such as continuous and stochastic image modeling, fast transfer to new tasks, and dynamic computation allocation based on sample complexity. From the perspective of feature learning, we show how transferable features emerge from generative modeling with massive unlabeled or weakly labeled data. Such features enable both image generation under complex conditions and downstream applications like image recognition and segmentation. By combining both perspectives, we achieve improved performance on computer vision tasks with limited labeled data, enhanced transferability of deep features, and novel capabilities beyond standard deep learning models.</p>
|
272 |
[pt] BUSCA DE ARQUITETURAS NEURAIS COM ALGORITMOS EVOLUTIVOS DE INSPIRAÇÃO QUÂNTICA / [en] QUANTUM-INSPIRED NEURAL ARCHITECTURE SEARCHDANIELA DE MATTOS SZWARCMAN 13 August 2020 (has links)
[pt] As redes neurais deep são modelos poderosos e flexíveis, que ganharam destaque na comunidade científica na última década. Para muitas tarefas, elas até superam o desempenho humano. Em geral, para obter tais resultados, um especialista despende tempo significativo para projetar a arquitetura neural, com longas sessões de tentativa e erro. Com isso, há um interesse crescente em automatizar esse processo. Novos métodos baseados em técnicas como aprendizado por reforço e algoritmos evolutivos foram apresentados como abordagens para o problema da busca de arquitetura neural (NAS - Neural Architecture Search), mas muitos ainda são algoritmos de alto custo computacional. Para reduzir esse custo, pesquisadores sugeriram
limitar o espaço de busca, com base em conhecimento prévio. Os algoritmos evolutivos de inspiração quântica (AEIQ) apresentam resultados promissores em relação à convergência mais rápida. A partir dessa idéia, propõe-se o Q-NAS: um AEIQ para buscar redes deep através da montagem de subestruturas. O Q-NAS também pode evoluir alguns hiperparâmetros numéricos, o que é um primeiro passo para a automação completa. Experimentos com o conjunto de dados CIFAR-10 foram realizados a fim de analisar detalhes do Q-NAS. Para muitas configurações de parâmetros, foram obtidos resultados satisfatórios. As melhores acurácias no CIFAR-10 foram de 93,85 porcento para uma rede residual e 93,70 porcento para uma rede convolucional, superando modelos elaborados por especialistas e alguns métodos de NAS. Incluindo um esquema simples de parada antecipada, os tempos de evolução nesses casos foram de 67 dias de GPU e 48 dias de GPU, respectivamente. O Q-NAS foi aplicado ao CIFAR-100, sem qualquer ajuste de parâmetro, e obteve 74,23 porcento de acurácia, similar a uma ResNet com 164 camadas. Por fim, apresenta-se um estudo de caso com dados reais, no qual utiliza-se o Q-NAS para resolver a tarefa de classificação sísmica. Em menos de 8,5 dias de GPU, o Q-NAS gerou redes com 12 vezes menos pesos e maior acurácia do que um modelo criado especialmente para esta tarefa. / [en] Deep neural networks are powerful and flexible models that have gained the attention of the machine learning community over the last decade. For a variety of tasks, they can even surpass human-level performance. Usually, to reach these excellent results, an expert spends significant time designing the neural architecture, with long trial and error sessions. In this scenario, there is a growing interest in automating this design process. To address the neural architecture search (NAS) problem, authors have presented new methods based on techniques such as reinforcement learning and evolutionary algorithms, but the high computational cost is still an issue for many of them. To reduce this cost, researchers have proposed to restrict the search space, with the help of expert knowledge. Quantum-inspired evolutionary algorithms present promising results regarding faster convergence. Motivated by this idea, we propose Q-NAS: a quantum-inspired algorithm to search for deep networks by assembling substructures. Q-NAS can also evolve some numerical hyperparameters, which is a first step in the direction of complete automation. We ran several experiments with the CIFAR-10 dataset to analyze the details of the algorithm. For
many parameter settings, Q-NAS was able to achieve satisfactory results. Our best accuracies on the CIFAR-10 task were 93.85 percent for a residual network and 93.70 percent for a convolutional network, overcoming hand-designed models, and some NAS works. Considering the addition of a simple early-stopping mechanism, the evolution times for these runs were 67 GPU days and 48 GPU days, respectively. Also, we applied Q-NAS to CIFAR-100 without any parameter adjustment, reaching an accuracy of 74.23 percent, which is comparable to a ResNet with 164 layers. Finally, we present a case study with real datasets, where we used Q-NAS to solve the seismic classification task. In less than 8.5 GPU days, Q-NAS generated networks with 12 times fewer weights and higher accuracy than a model specially created for this task.
|
273 |
[pt] AVALIAÇÃO DE AUMENTO DE DADOS VIA GERAÇÃO DE IMAGENS SINTÉTICAS PARA SEGMENTAÇÃO E DETECÇÃO DE PÓLIPOS EM IMAGENS DE COLONOSCOPIA UTILIZANDO APRENDIZADO DE MÁQUINA / [en] EVALUATION OF DATA AUGMENTATION THROUGH SYNTHETIC IMAGES GENERATION FOR SEGMENTATION AND DETECTION OF POLYPS IN COLONOSCOPY IMAGES USING MACHINE LEARNINGVICTOR DE ALMEIDA THOMAZ 17 August 2020 (has links)
[pt] O câncer de cólon é atualmente a segunda principal causa de morte por câncer no mundo. Nos últimos anos houve um aumento do interesse em pesquisas voltadas para o desenvolvimento de métodos automáticos para detecção de pólipos e os resultados mais relevantes foram alcançados por meio de técnicas de aprendizado profundo. No entanto, o desempenho destas abordagens está fortemente associado ao uso de grandes e variados conjuntos de dados. Amostras de imagens de colonoscopia estão disponíveis publicamente, porém a quantidade e a variação limitada podem ser insuficientes para um treinamento bem-sucedido. O trabalho de pesquisa desta tese propõe uma estratégia para aumentar a quantidade e variação de imagens de colonoscopia, melhorando os resultados de segmentação e detecção de pólipos. Diferentemente de outros trabalhos encontrados na literatura que fazem uso de abordagens tradicionais de aumento de dados (data augmentation) e da combinação de imagens de outras modalidades de exame, esta metodologia enfatiza a criação de novas amostras inserindo pólipos em imagens de colonoscopia publicamente disponíveis. A estratégia de inserção faz uso de pólipos gerados sinteticamente e também de pólipos reais, além de aplicar técnicas de processamento para preservar o aspecto realista das imagens, ao mesmo tempo em que cria automaticamente amostras mais diversas com seus rótulos apropriados para fins de treinamento. As redes neurais convolucionais treinadas com estes conjuntos de dados aprimorados apresentaram resultados promissores no contexto de segmentação e detecção. As melhorias obtidas indicam que a implementação de novos métodos para aprimoramento automático de amostras em conjuntos de imagens médicas tem potencial de afetar positivamente o treinamento de redes convolucionais. / [en] Nowadays colorectal cancer is the second-leading cause of cancer death worldwide. In recent years there has been an increase in interest in research aimed at the development of automatic methods for the detection of polyps and the most relevant results have been achieved through deep learning techniques. However, the performance of these approaches is strongly associated with the use of large and varied datasets. Samples of colonoscopy images are publicly available, but the amount and limited variation may be insufficient for successful training. Based on this observation, a new approach is described in this thesis with the objective of increasing the quantity and variation of colonoscopy images, improving the results of segmentation and detection of polyps. Unlike other works found in the literature that use traditional data augmentation approaches and the combination of images from other exam modalities, the proposed methodology emphasizes the creation of new samples by inserting polyps in publicly available colonoscopy images. The insertion strategy makes use of synthetically generated polyps as well as real polyps, in addition to applying processing techniques to preserve the realistic aspect of the images, while automatically creating more diverse samples with their appropriate labels for training purposes. Convolutional neural networks trained with these improved datasets have shown promising results in the context of segmentation and detection. The improvements obtained indicate that the implementation of new methods for the automatic improvement of samples in medical image datasets has the potential to positively affect the training of convolutional networks.
|
274 |
An Investigation of Low-Rank Decomposition for Increasing Inference Speed in Deep Neural Networks With Limited Training DataWikén, Victor January 2018 (has links)
In this study, to increase inference speed of convolutional neural networks, the optimization technique low-rank tensor decomposition has been implemented and applied to AlexNet which had been trained to classify dog breeds. Due to a small training set, transfer learning was used in order to be able to classify dog breeds. The purpose of the study is to investigate how effective low-rank tensor decomposition is when the training set is limited. The results obtained from this study, compared to a previous study, indicate that there is a strong relationship between the effects of the tensor decomposition and how much available training data exists. A significant speed up can be obtained in the different convolutional layers using tensor decomposition. However, since there is a need to retrain the network after the decomposition and due to the limited dataset there is a slight decrease in accuracy. / För att öka inferenshastigheten hos faltningssnätverk, har i denna studie optimeringstekniken low-rank tensor decomposition implementerats och applicerats på AlexNet, som har tränats för att klassificera hundraser. På grund av en begränsad mängd träningsdata användes transfer learning för uppgiften. Syftet med studien är att undersöka hur effektiv low-rank tensor decomposition är när träningsdatan är begränsad. Jämfört med resultaten från en tidigare studie visar resultaten från denna studie att det finns ett starkt samband mellan effekterna av low-rank tensor decomposition och hur mycket tillgänglig träningsdata som finns. En signifikant hastighetsökning kan uppnås i de olika faltningslagren med hjälp av low-rank tensor decomposition. Eftersom det finns ett behov av att träna om nätverket efter dekompositionen och på grund av den begränsade mängden data så uppnås hastighetsökningen dock på bekostnad av en viss minskning i precisionen för modellen.
|
275 |
Graphical Glitch Detection in Video Games Using CNNs / Användning av CNNs för att upptäcka felaktiga bilder i videospelGarcía Ling, Carlos January 2020 (has links)
This work addresses the following research question: Can we detect videogame glitches using Convolutional Neural Networks? Focusing on the most common types of glitches, texture glitches (Stretched, Lower Resolution, Missing, and Placeholder). We first systematically generate a dataset with both images with texture glitches and normal samples. To detect the faulty images we try both Classification and Semantic Segmentation approaches, with a clear focus on the former. The best setting in classification uses a ShuffleNetV2 architecture and obtains precisions of 80.0%, 64.3%, 99.2%, and 97.0% in the respective glitch classes Stretched, Lower Resolution, Missing, and Placeholder. All of this with a low false positive rate of 6.7%. To complement this study, we also discuss how the models extrapolate to different graphical environments, which are the main sources of confusion for the model, how to estimate the confidence of the network, and ways to interpret the internal behavior of the models. / Detta projekt svarar på följande forskningsfråga: Kan man använda Convolutional Neural Networks för att upptäcka felaktiga bilder i videospel? Vi fokuserar på de vanligast förekommande grafiska defekter i videospel, felaktiga textures (sträckt, lågupplöst, saknas och platshållare). Med hjälp av en systematisk process genererar vi data med både normala och felaktiga bilder. För att hitta defekter använder vi CNN via både Classification och Semantic Segmentation, med fokus på den första metoden. Den bäst presterande Classification-modellen baseras på ShuffleNetV2 och når 80.0%, 64.3%, 99.2% och 97.0% precision på respektive sträckt-, lågupplöst-, saknas- och platshållare-buggar. Detta medan endast 6.7% av negativa datapunkter felaktigt klassifieras som positiva. Denna undersökning ser även till hur modellen generaliserar till olika grafiska miljöer, vilka de primära orsakerna till förvirring hos modellen är, hur man kan bedöma säkerheten i nätverkets prediktion och hur man bättre kan förstå modellens interna struktur.
|
276 |
Fashion Object Detection and Pixel-Wise Semantic Segmentation : Crowdsourcing framework for image bounding box detection & Pixel-Wise SegmentationMallu, Mallu January 2018 (has links)
Technology has revamped every aspect of our life, one of those various facets is fashion industry. Plenty of deep learning architectures are taking shape to augment fashion experiences for everyone. There are numerous possibilities of enhancing the fashion technology with deep learning. One of the key ideas is to generate fashion style and recommendation using artificial intelligence. Likewise, another significant feature is to gather reliable information of fashion trends, which includes analysis of existing fashion related images and data. When specifically dealing with images, localisation and segmentation are well known to address in-depth study relating to pixels, objects and labels present in the image. In this master thesis a complete framework is presented to perform localisation and segmentation on fashionista images. This work is a part of an interesting research work related to Fashion Style detection and Recommendation. Developed solution aims to leverage the possibility of localising fashion items in an image by drawing bounding boxes and labelling them. Along with that, it also provides pixel-wise semantic segmentation functionality which extracts fashion item label-pixel data. Collected data can serve as ground truth as well as training data for the aimed deep learning architecture. A study related to localisation and segmentation of videos has also been presented in this work. The developed system has been evaluated in terms of flexibility, output quality and reliability as compared to similar platforms. It has proven to be fully functional solution capable of providing essential localisation and segmentation services while keeping the core architecture simple and extensible. / Tekniken har förnyat alla aspekter av vårt liv, en av de olika fasetterna är modeindustrin. Massor av djupa inlärningsarkitekturer tar form för att öka modeupplevelser för alla. Det finns många möjligheter att förbättra modetekniken med djup inlärning. En av de viktigaste idéerna är att skapa modestil och rekommendation med hjälp av artificiell intelligens. På samma sätt är en annan viktig egenskap att samla pålitlig information om modetrender, vilket inkluderar analys av befintliga moderelaterade bilder och data. När det specifikt handlar om bilder är lokalisering och segmentering väl kända för att ta itu med en djupgående studie om pixlar, objekt och etiketter som finns i bilden. I denna masterprojekt presenteras en komplett ram för att utföra lokalisering och segmentering på fashionista bilder. Detta arbete är en del av ett intressant forskningsarbete relaterat till Fashion Style detektering och rekommendation. Utvecklad lösning syftar till att utnyttja möjligheten att lokalisera modeartiklar i en bild genom att rita avgränsande lådor och märka dem. Tillsammans med det tillhandahåller det även pixel-wise semantisk segmenteringsfunktionalitet som extraherar dataelementetikett-pixeldata. Samlad data kan fungera som grundsannelse samt träningsdata för den riktade djuplärarkitekturen. En studie relaterad till lokalisering och segmentering av videor har också presenterats i detta arbete. Det utvecklade systemet har utvärderats med avseende på flexibilitet, utskriftskvalitet och tillförlitlighet jämfört med liknande plattformar. Det har visat sig vara en fullt fungerande lösning som kan tillhandahålla viktiga lokaliseringsoch segmenteringstjänster samtidigt som kärnarkitekturen är enkel och utvidgbar.
|
277 |
DEEP SKETCH-BASED CHARACTER MODELING USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKSAleena Kyenat Malik Aslam (14216159) 07 December 2022 (has links)
<p>3D character modeling is a crucial process of asset creation in the entertainment industry, particularly for animation and games. A fully automated pipeline via sketch-based 3D modeling (SBM) is an emerging possibility, but development is stalled by unrefined outputs and a lack of character-centered tools. This thesis proposes an improved method for constructing 3D character models with minimal user input, using only two sketch inputs i.e., a front and side unshaded sketch. The system implements a deep convolutional neural network (CNN), a type of deep learning algorithm extending from artificial intelligence (AI), to process the input sketch and generate multi-view depth, normal and confidence maps that offer more information about the 3D surface. These are then fused into a 3D point cloud, which is a type of object representation for 3D space. This point cloud is converted into a 3D mesh via an occupancy network, involving another CNN, for a more precise 3D representation. This reconstruction step contends with non-deep learning approaches such as Poisson reconstruction. The proposed system is evaluated for character generation on standardized quantitative metrics (i.e., Chamfer Distance [CD], Earth Mover’s Distance [EMD], F-score and Intersection of Union [IoU]), and compared to the base framework trained on the same character sketch and model database. This implementation offers a significant improvement in the accuracy of vertex positions for the reconstructed character models. </p>
|
278 |
Computer Vision for Document Image Analysis and Text Extraction / Datorseende för analys av dokumentbilder och textutvinningBenchekroun, Omar January 2022 (has links)
Automatic document processing has been a subject of interest in the industry for the past few years, especially with the recent technological advances in Machine Learning and Computer Vision. This project investigates in-depth a major component used in Document Image Processing known as Optical Character Recognition (OCR). First, an improvement upon existing shallow CNN+LSTM is proposed, using domain-specific data synthesis. We demonstrate that this model can achieve an accuracy of up to 97% on non-handwritten text, with an accuracy improvement of 24% when using synthetic data. Furthermore, we deal with handwritten text that presents more challenges including the variance of writing style, slanting, and character ambiguity. A CNN+Transformer architecture is validated to recognize handwriting extracted from real-world insurance statements data. This model achieves a maximal accuracy of 92% on real-world data. Moreover, we demonstrate how a data pipeline relying on synthetic data can be a scalable and affordable solution for modern OCR needs. / Automatisk dokumenthantering har varit ett ämne av intresse i branschen under de senaste åren, särskilt med de senaste tekniska framstegen inom maskininlärning och datorseende. I detta projekt kommer man att på djupet undersöka en viktig komponent som används vid bildbehandling av dokument och som kallas optisk teckenigenkänning (OCR). Först kommer en förbättring av befintlig ytlig CNN+LSTM att föreslås, med hjälp av domänspecifik datasyntes. Vi kommer att visa att denna modell kan uppnå en noggrannhet på upp till 97% på icke handskriven text, med en förbättring av noggrannheten på 24% när syntetiska data används. Dessutom kommer vi att behandla handskriven text som innebär fler utmaningar, t.ex. variationer i skrivstilen, snedställningar och tvetydiga tecken. En CNN+Transformer-arkitektur kommer att valideras för att känna igen handskrift från verkliga data om försäkringsbesked. Denna modell uppnår en maximal noggrannhet på 92% på verkliga data. Dessutom kommer vi att visa hur en datapipeline som bygger på syntetiska data är en skalbar och prisvärd lösning för moderna OCR-behov.
|
279 |
Deep Learning-based Regularizers for Cone Beam Computed Tomography Reconstruction / Djupinlärningsbaserade regulariserare för rekonstruktion inom volymtomografiSyed, Sabina, Stenberg, Josefin January 2023 (has links)
Cone Beam Computed Tomography is a technology to visualize the 3D interior anatomy of a patient. It is important for image-guided radiation therapy in cancer treatment. During a scan, iterative methods are often used for the image reconstruction step. A key challenge is the ill-posedness of the resulting inversion problem, causing the images to become noisy. To combat this, regularizers can be introduced, which help stabilize the problem. This thesis focuses on Adversarial Convex Regularization that with deep learning regularize the scans according to a target image quality. It can be interpreted in a Bayesian setting by letting the regularizer be the prior, approximating the likelihood with the measurement error, and obtaining the patient image through the maximum-a-posteriori estimate. Adversarial Convex Regularization has previously shown promising results in regular Computed Tomography, and this study aims to investigate its potential in Cone Beam Computed Tomography. Three different learned regularization methods have been developed, all based on Convolutional Neural Network architectures. One model is based on three-dimensional convolutional layers, while the remaining two rely on 2D layers. These two are in a later stage crafted to be applicable to 3D reconstruction by either stacking a 2D model or by averaging 2D models trained in three orthogonal planes. All neural networks are trained on simulated male pelvis data provided by Elekta. The 3D convolutional neural network model has proven to be heavily memory-consuming, while not performing better than current reconstruction methods with respect to image quality. The two architectures based on merging multiple 2D neural network gradients for 3D reconstruction are novel contributions that avoid memory issues. These two models outperform current methods in terms of multiple image quality metrics, such as Peak Signal-to-Noise Ratio and Structural Similarity Index Measure, and they also generalize well for real Cone Beam Computed Tomography data. Additionally, the architecture based on a weighted average of 2D neural networks is able to capture spatial interactions to a larger extent and is adjustable to favor the plane that best shows the field of interest, a possibly desirable feature in medical practice. / Volymtomografi kan användas inom cancerbehandling för att skapa bilder av patientens inre anatomi i 3D som sedan används vid stråldosplanering. Under den rekonstruerande fasen i en skanning används ofta iterativa metoder. En utmaning är att det resulterande inversionsproblemet är illa ställt, vilket leder till att bilderna blir brusiga. För att motverka detta kan regularisering introduceras som bidrar till att stabilisera problemet. Fokus för denna uppsats är Adversarial Convex Regularization som baserat på djupinlärning regulariserar bilderna enligt en målbildskvalitet. Detta kan även tolkas ur ett Bayesianskt perspektiv genom att betrakta regulariseraren som apriorifördelningen, approximera likelihoodfördelningen med mätfelet samt erhålla patientbilden genom maximum-a-posteriori-skattningen. Adversarial Convex Regularization har tidigare visat lovande resultat för data från Datortomografi och syftet med denna uppsats är att undersöka dess potential för Volymtomografi. Tre olika inlärda regulariseringsmetoder har utvecklats med hjälp av faltningsnätverk. En av modellerna bygger på faltning av tredimensionella lager, medan de återstående två är baserade på 2D-lager. Dessa två sammanförs i ett senare skede för att kunna appliceras vid 3D-rekonstruktion, antingen genom att stapla 2D modeller eller genom att beräkna ett viktat medelvärde av tre 2D-modeller som tränats i tre ortogonala plan. Samtliga modeller är tränade på simulerad manlig bäckendata från Elekta. 3D-faltningsnätverket har visat sig vara minneskrävande samtidigt som det inte presterar bättre än nuvarande rekonstruktionsmetoder med avseende på bildkvalitet. De andra två metoderna som bygger på att stapla flera gradienter av 2D-nätverk vid 3D-rekonstruktion är ett nytt vetenskapligt bidrag och undviker minnesproblemen. Dessa två modeller överträffar nuvarande metoder gällande flera bildkvalitetsmått och generaliserar även väl för data från verklig Volymtomografi. Dessutom lyckas modellen som bygger på ett viktat medelvärde av 2D-nätverk i större utsträckning fånga spatiala interaktioner. Den kan även anpassas till att gynna det plan som bäst visar intresseområdet i kroppen, vilket möjligtvis är en önskvärd egenskap i medicinska sammanhang.
|
280 |
Neural Networks for improved signal source enumeration and localization with unsteered antenna arraysRogers, John T, II 08 December 2023 (has links) (PDF)
Direction of Arrival estimation using unsteered antenna arrays, unlike mechanically scanned or phased arrays, requires complex algorithms which perform poorly with small aperture arrays or without a large number of observations, or snapshots. In general, these algorithms compute a sample covriance matrix to obtain the direction of arrival and some require a prior estimate of the number of signal sources. Herein, artificial neural network architectures are proposed which demonstrate improved estimation of the number of signal sources, the true signal covariance matrix, and the direction of arrival. The proposed number of source estimation network demonstrates robust performance in the case of coherent signals where conventional methods fail. For covariance matrix estimation, four different network architectures are assessed and the best performing architecture achieves a 20 times improvement in performance over the sample covariance matrix. Additionally, this network can achieve comparable performance to the sample covariance matrix with 1/8-th the amount of snapshots. For direction of arrival estimation, preliminary results are provided comparing six architectures which all demonstrate high levels of accuracy and demonstrate the benefits of progressively training artificial neural networks by training on a sequence of sub- problems and extending to the network to encapsulate the entire process.
|
Page generated in 0.1156 seconds