Spelling suggestions: "subject:"image recognition"" "subject:"lmage recognition""
61 |
Robustness of Image Classification Using CNNs in Adverse ConditionsIngelstam, Theo, Skåntorp, Johanna January 2022 (has links)
The usage of convolutional neural networks (CNNs) has revolutionized the field of computer vision. Though the algorithms used in image recognition have improved significantly in the past decade, they are still limited by the availability of training data. This paper aims to gain a better understanding of how limitations in the training data might affect the performance of the system. A robustness study was conducted. The study utilizes three different image datasets; pre-training CNN models on the ImageNet or CIFAR-10 datasets, and then training on the MAdWeather dataset, whose main characteristic is containing images with differing levels of obscurity in front of the objects in the images. The MAdWeather dataset is used in order to test how accurately a model can identify images that differ from its training dataset. The study shows that CNNs performance on one condition does not translate well to other conditions. / Bildklassificering med hjälp av datorer har revolutionerats genom introduktionen av CNNs. Och även om algoritmerna har förbättrats avsevärt, så är de fortsatt begränsade av tillgänglighet av data. Syftet med detta projekt är att få en bättre förståelse för hur begränsningar i träningsdata kan påverka prestandan för en modell. En studie genomförs för att avgöra hur robust en modell är mot att förutsättningarna, under vilka bilderna tas, förändras. Studien använder sig av tre olika dataset: ImageNet och CIFAR-10, för förträning av modellerna, samt MAdWeather för vidare träning. MAdWeather är speciellt framtaget med bilder där objekten är till olika grad grumlade. MAdWeather datasetet används vidare för att avgöra hur bra en modell är på att klassificera bilder som tagits fram under omständigheter som avviker från träningsdatan. Studien visar att CNNs prestanda på en viss omständighet, inte kan generaliseras till andra omständigheter. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm
|
62 |
Detection and categorization of suggestive thumbnails : A step towards a safer internet / Upptäckt och kategorisering av suggestiva miniatyrer : Ett steg mot ett säkrare internetOliveira Franca, Matheus January 2021 (has links)
The aim of this work is to compare methods that predict whether an image has suggestive content, such as pornographic images and erotic fashion. Using binary classification, this work contributes to an internet environment where these images are not seen out of context. It is, therefore, necessary for user experience improvement purposes, such as child protection, publishers not having their campaign associated with inappropriate content, and companies improving their brand safety. For this study, a data set with more than 500k images was created to test the Convolutional Neural Networks (CNN) models: NSFW model, ResNet, EfficientNet, BiT, NudeNet and Yahoo Model. The image classification model EfficientNet-B7 and Big Transfer (BiT) presented the best results with over 91% samples correctly classified on the test set, with precision and recall around 0.7. Model prediction was further investigated using Local Interpretable Model-agnostic Explanation (LIME), a model explainability technique, and concluded that the model uses coherent regions of the thumbnail according to a human perspective such as legs, abdominal, and chest to classify images as unsafe. / Syftet med detta arbete är att jämföra metoder som förutsäger om en bild har suggestivt innehåll, såsom pornografiska bilder och erotiskt mode. Med binär klassificering bidrar detta arbete till en internetmiljö där dessa bilder inte ses ur sitt sammanhang. Det är därför nödvändigt för att förbättra användarupplevelsen, till exempel barnskydd, utgivare som inte har sina kampanjer kopplade till olämpligt innehåll och företag som förbättrar deras varumärkessäkerhet. För denna studie skapades en datamängd med mer än 500 000 bilder för att testa Convolutional Neural Networks (CNN) modeller: NSFW-modell, ResNet, EfficientNet, BiT, NudeNet och Yahoo-modell. Bild klassificerings modellen EfficientNet-B7 och Big Transfer (BiT) presenterade de bästa resultaten med över 91%prover korrekt klassificerade på testuppsättningen, med precision och återkallelse runt 0,7. Modell förutsägelse undersöktes ytterligare med hjälp av Local Interpretable Model-agnostic Explanation (LIME), en modell förklarbarhetsteknik, och drog slutsatsen att modellen använder sammanhängande regioner i miniatyren enligt ett mänskligt perspektiv såsom ben, buk och bröst för att klassificera bilder som osäkra.
|
63 |
Bildklassificering av bilar med hjälp av deep learning / Image Classification of Cars using Deep LearningLindespång, Victor January 2017 (has links)
Den här rapporten beskriver hur en bildklassificerare skapades med förmågan att via en given bild på en bil avgöra vilken bilmodell bilen är av. Klassificeringsmodellen utvecklades med hjälp av bilder som företaget CAB sparat i samband med försäkringsärenden som behandlats via deras nuvarande produkter. Inledningsvis i rapporten så beskrivs teori för maskininlärning och djupinlärning på engrundläggande nivå för att leda in läsaren på ämnesområdet som rör rapporten, och fortsätter sedan med problemspecifika metoder som var till nytta för det aktuella problemet. Rapporten tar upp metoder för hur datan bearbetats i förväg, hur träningsprocessen gick till med de valda verktygen samt diskussion kring resultatet och vad som påverkade det – med kommentarer om vad som kan göras i framtiden för att förbättra slutprodukten. / This report describes how an image classifier was created with the ability to identify car makeand model from a given picture of a car. The classifier was developed using pictures that the company CAB had saved from insurance errands that was managed through their current products. First of all the report begins with a brief theoretical introduction to machine learning and deep learning to guide the reader in to the subject of the report, and then continues with problemspecific methods that were of good use for the project. The report brings up methods for how the data was processed before training took place, how the training process went with the chosen tools for this project and also discussion about the result and what effected it – with comments about what can be done in the future to improve the end product.
|
64 |
Track the number of people in a premises in real time / Spåra antalet personer i en lokal i realtidHeidar, Hamza January 2022 (has links)
Det har blivit allt vanligare att inomhusverksamheter vill kunna bevaka antalet personer som befinner sig i deras lokaler. Att manuellt räkna antalet personer eller att använda sig utav rörelsesensorer har olika nackdelar. På grund av den anledningen är det lämpligt att utforska andra tekniska och mer automatiserade lösningar, som använder sig utav enkla komponenter. Litteraturstudien gav en förståelse om bildanalys och vilka tekniska verktyg som kan användas för att analysera bilder. Amazon Rekognition och OpenCV är två av de verktyg som användes för att kunna bygga en prototyp, som kan räkna antalet personer i en lokal i realtid. Resultatet visade att en lösning med OpenCV inte är möjlig, med de kunskaper litteraturstudien gav. Resultatet ifrån Amazon Rekognition indikerar att det är möjligt att räkna antalet personer med väldigt hög noggrannhet och precision. Precis som att en människa kan bli distraherad, kan även prototypen missa enstaka personer. Amazon Rekognition kunde även särskilja människor ifrån andra objekt, vilket en rörelsesensor inte kan göra. Resultatet visade även fåtal brister så som dålig responstid, men dessa brister hade kunnat åtgärdas ifall mer tid återstod. / It has become increasingly common for indoor businesses to be able to monitor the number of people who are in their premises. Manually counting the number of people or using motion sensors has various disadvantages. For this reason, it is advisable to explore other technical and more automated solutions, which use simple components. The literature study provided an understanding of image analysis and the technical tools that can be used to analyze images. Amazon Recognition and OpenCV are two of the tools used to build two prototypes that can count the number of people in a room in real time. The results showed that a solution with OpenCV is not possible, with the knowledge the literature study provided. The result from Amazon Recognition indicates that it is possible to count the number of people with very high accuracy and precision. Just as a human being can be distracted, the prototype can also miss individual people. Amazon Recognition could also distinguish people from other objects, which a motion sensor cannot do. The results also showed a few shortcomings such as poor response time, but these shortcomings could have been remedied if more time remained.
|
65 |
A Deep Learning Approach to Detection and Classification of Small Defects on Painted Surfaces : A Study Made on Volvo GTO, UmeåRönnqvist, Johannes, Sjölund, Johannes January 2019 (has links)
In this thesis we conclude that convolutional neural networks, together with phase-measuring deflectometry techniques, can be used to create models which can detect and classify defects on painted surfaces very well, even compared to experienced humans. Further, we show which preprocessing measures enhances the performance of the models. We see that standardisation does increase the classification accuracy of the models. We demonstrate that cleaning the data through relabelling and removing faulty images improves classification accuracy and especially the models' ability to distinguish between different types of defects. We show that oversampling might be a feasible method to improve accuracy through increasing and balancing the data set by augmenting existing observations. Lastly, we find that combining many images with different patterns heavily increases the classification accuracy of the models. Our proposed approach is demonstrated to work well in a real-time factory environment. An automated quality control of the painted surfaces of Volvo Truck cabins could give great benefits in cost and quality. The automated quality control could provide data for a root-cause analysis and a quick and efficient alarm system. This could significantly streamline production and at the same time reduce costs and errors in production. Corrections and optimisation of the processes could be made in earlier stages in time and with higher precision than today. / I den här rapporten visar vi att modeller av typen convolutional neural networks, tillsammans med phase-measuring deflektometri, kan hitta och klassificera defekter på målade ytor med hög precision, även jämfört med erfarna operatörer. Vidare visar vi vilka databehandlingsåtgärder som ökar modellernas prestanda. Vi ser att standardisering ökar modellernas klassificeringsförmåga. Vi visar att städning av data genom ommärkning och borttagning av felaktiga bilder förbättrar klassificeringsförmågan och särskilt modellernas förmåga att särskilja mellan olika typer av defekter. Vi visar att översampling kan vara en metod för att förbättra precisionen genom att öka och balansera datamängden genom att förändra och duplicera befintliga observationer. Slutligen finner vi att kombinera flera bilder med olika mönster ökar modellernas klassificeringsförmåga väsentligt. Vårt föreslagna tillvägagångssätt har visat sig fungera bra i realtid inom en produktionsmiljö. En automatiserad kvalitetskontroll av de målade ytorna på Volvos lastbilshytter kan ge stora fördelar med avseende på kostnad och kvalitet. Den automatiska kvalitetskontrollen kan ge data för en rotorsaksanalys och ett snabbt och effektivt alarmsystem. Detta kan väsentligt effektivisera produktionen och samtidigt minska kostnader och fel i produktionen. Korrigeringar och optimering av processerna kan göras i tidigare skeden och med högre precision än idag.
|
66 |
A new approach to automatic saliency identification in images based on irregularity of regionsAl-Azawi, Mohammad Ali Naji Said January 2015 (has links)
This research introduces an image retrieval system which is, in different ways, inspired by the human vision system. The main problems with existing machine vision systems and image understanding are studied and identified, in order to design a system that relies on human image understanding. The main improvement of the developed system is that it uses the human attention principles in the process of image contents identification. Human attention shall be represented by saliency extraction algorithms, which extract the salient regions or in other words, the regions of interest. This work presents a new approach for the saliency identification which relies on the irregularity of the region. Irregularity is clearly defined and measuring tools developed. These measures are derived from the formality and variation of the region with respect to the surrounding regions. Both local and global saliency have been studied and appropriate algorithms were developed based on the local and global irregularity defined in this work. The need for suitable automatic clustering techniques motivate us to study the available clustering techniques and to development of a technique that is suitable for salient points clustering. Based on the fact that humans usually look at the surrounding region of the gaze point, an agglomerative clustering technique is developed utilising the principles of blobs extraction and intersection. Automatic thresholding was needed in different stages of the system development. Therefore, a Fuzzy thresholding technique was developed. Evaluation methods of saliency region extraction have been studied and analysed; subsequently we have developed evaluation techniques based on the extracted regions (or points) and compared them with the ground truth data. The proposed algorithms were tested against standard datasets and compared with the existing state-of-the-art algorithms. Both quantitative and qualitative benchmarking are presented in this thesis and a detailed discussion for the results has been included. The benchmarking showed promising results in different algorithms. The developed algorithms have been utilised in designing an integrated saliency-based image retrieval system which uses the salient regions to give a description for the scene. The system auto-labels the objects in the image by identifying the salient objects and gives labels based on the knowledge database contents. In addition, the system identifies the unimportant part of the image (background) to give a full description for the scene.
|
67 |
Método para execução de redes neurais convolucionais em FPGA. / A method for execution of convolutional neural networks in FPGA.Sousa, Mark Cappello Ferreira de 26 April 2019 (has links)
Redes Neurais Convolucionais têm sido utilizadas com sucesso para reconhecimento de padrões em imagens. Porém, o seu alto custo computacional e a grande quantidade de parâmetros envolvidos dificultam a execução em tempo real deste tipo de rede neural artificial em aplicações embarcadas, onde o poder de processamento e a capacidade de armazenamento de dados são restritos. Este trabalho estudou e desenvolveu um método para execução em tempo real em FPGAs de uma Rede Neural Convolucional treinada, aproveitando o poder de processamento paralelo deste tipo de dispositivo. O foco deste trabalho consistiu na execução das camadas convolucionais, pois estas camadas podem contribuir com até 99% da carga computacional de toda a rede. Nos experimentos, um dispositivo FPGA foi utilizado conjugado com um processador ARM dual-core em um mesmo substrato de silício. Apenas o dispositivo FPGA foi utilizado para executar as camadas convolucionais da Rede Neural Convolucional AlexNet. O método estudado neste trabalho foca na distribuição eficiente dos recursos do FPGA por meio do balanceamento do pipeline formado pelas camadas convolucionais, uso de buffers para redução e reutilização de memória para armazenamento dos dados intermediários (gerados e consumidos pelas camadas convolucionais) e uso de precisão numérica de 8 bits para armazenamento dos kernels e aumento da vazão de leitura dos mesmos. Com o método desenvolvido, foi possível executar todas as cinco camadas convolucionais da AlexNet em 3,9 ms, com a frequência máxima de operação de 76,9 MHz. Também foi possível armazenar todos os parâmetros das camadas convolucionais na memória interna do FPGA, eliminando possíveis gargalos de acesso à memória externa. / Convolutional Neural Networks have been used successfully for pattern recognition in images. However, their high computational cost and the large number of parameters involved make it difficult to perform this type of artificial neural network in real time in embedded applications, where the processing power and the data storage capacity are restricted. This work studied and developed methods for real-time execution in FPGAs of a trained convolutional neural network, taking advantage of the parallel processing power of this type of device. The focus of this work was the execution of convolutional layers, since these layers can contribute up to 99% of the computational load of the entire network. In the experiments, an FPGA device was used in conjunction with a dual-core ARM processor on the same silicon substrate. The FPGA was used to perform convolutional layers of the AlexNet Convolutional Neural Network. The methods studied in this work focus on the efficient distribution of the FPGA resources through the balancing of the pipeline formed by the convolutional layers, the use of buffers for the reduction and reuse of memory for the storage of intermediate data (generated and consumed by the convolutional layers) and 8 bits for storage of the kernels and increase of the flow of reading of them. With the developed methods, it was possible to execute all five AlexNet convolutional layers in 3.9 ms with the maximum operating frequency of 76.9 MHz. It was also possible to store all the parameters of the convolutional layers in the internal memory of the FPGA, eliminating possible external access memory bottlenecks.
|
68 |
Segmentation et classification dans les images de documents numérisés / Segmentation and classification of digitized document imagesOuji, Asma 01 June 2012 (has links)
Les travaux de cette thèse ont été effectués dans le cadre de l'analyse et du traitement d'images de documents imprimés afin d'automatiser la création de revues de presse. Les images en sortie du scanner sont traitées sans aucune information a priori ou intervention humaine. Ainsi, pour les caractériser, nous présentons un système d'analyse de documents composites couleur qui réalise une segmentation en zones colorimétriquement homogènes et qui adapte les algorithmes d'extraction de textes aux caractéristiques locales de chaque zone. Les informations colorimétriques et textuelles fournies par ce système alimentent une méthode de segmentation physique des pages de presse numérisée. Les blocs issus de cette décomposition font l'objet d'une classification permettant, entre autres, de détecter les zones publicitaires. Dans la continuité et l'expansion des travaux de classification effectués dans la première partie, nous présentons un nouveau moteur de classification et de classement générique, rapide et facile à utiliser. Cette approche se distingue de la grande majorité des méthodes existantes qui reposent sur des connaissances a priori sur les données et dépendent de paramètres abstraits et difficiles à déterminer par l'utilisateur. De la caractérisation colorimétrique au suivi des articles en passant par la détection des publicités, l'ensemble des approches présentées ont été combinées afin de mettre au point une application permettant la classification des documents de presse numérisée par le contenu. / In this thesis, we deal with printed document images processing and analysis to automate the press reviews. The scanner output images are processed without any prior knowledge nor human intervention. Thus, to characterize them, we present a scalable analysis system for complex documents. This characterization is based on a hybrid color segmentation suited to noisy document images. The color analysis customizes text extraction algorithms to fit the local image properties. The provided color and text information is used to perform layout segmentation in press images and to compute features on the resulting blocks. These elements are classified to detect advertisements. In the second part of this thesis, we deal with a more general purpose: clusternig and classification. We present a new clustering approach, named ACPP, which is completely automated, fast and easy to use. This approach's main features are its independence of prior knowledge about the data and theoretical parameters that should be determined by the user. Color analysis, layout segmentation and the ACPP classification method are combined to create a complete processing chain for press images.
|
69 |
Appariement de formes basé sur une squelettisation hiérarchique / Shape matching based on a hierarchical skeletonizationLeborgne, Aurélie 11 July 2016 (has links)
Les travaux effectués durant cette thèse portent sur l’appariement de formes planes basé sur une squelettisation hiérarchique. Dans un premier temps, nous avons abordé la création d’un squelette de forme grâce à un algorithme associant des outils de la géométrie discrète et des filtres. Cette association permet d’acquérir un squelette regroupant les propriétés désirées dans le cadre de l’appariement. Néanmoins, le squelette obtenu reste une représentation de la forme ne différenciant pas les branches représentant l’allure générale de celles représentant un détail de la forme. Or, lors de l’appariement, il semble plus intéressant d’associer des branches ayant le même ordre d’importance, mais aussi de donner plus de poids aux associations décrivant un aspect global des formes. Notre deuxième contribution porte sur la résolution de ce problème. Elle concerne donc la hiérarchisation des branches du squelette, précédemment créé, en leur attribuant une pondération reflétant leur importance dans la forme. À cet effet, nous lissons progressivement une forme et étudions la persistance des branches pour leur attribuer un poids. L’ultime étape consiste donc à apparier les formes grâce à leur squelette hiérarchique modélisé par un hypergraphe. En d’autres termes, nous associons les branches deux à deux pour déterminer une mesure de dissimilarité entre deux formes. Pour ce faire, nous prenons en compte la géométrie des formes, la position relative des différentes parties des formes ainsi que de leur importance. / The works performed during this thesis focuses on the matching of planar shapes based on a hierarchical skeletonisation. First, we approached the creation of a shape skeleton using an algorithm combining the tools of discrete geometry and filters. This combination allows to acquire a skeleton gathering the desired properties in the context of matching. Nevertheless, the resulting skeleton remains a representation of the shape, which does not differentiate branches representing the general shape of those coming from a detail of the shape. But when matching, it seems more interesting to pair branches of the same order of importance, but also to give more weight to associations describing an overall appearance of shapes. Our second contribution focuses on solving this problem. It concerns the prioritization of skeletal branches, previously created by assigning a weight reflecting their importance in shape. To this end, we gradually smooth a shape and study the persistence of branches to assign a weight. The final step is to match the shapes with their hierarchical skeleton modeled as a hypergraph. In other words, we associate the branches two by two to determine a dissimilarity measure between two shapes. To do this, we take into account the geometry of the shapes, the relative position of different parts of the shapes and their importance.
|
70 |
Reconhecimento de imagens de marcas de gado utilizando redes neurais convolucionais e máquinas de vetores de suporteSantos, Carlos Alexandre Silva dos 26 September 2017 (has links)
Submitted by Marlucy Farias Medeiros (marlucy.farias@unipampa.edu.br) on 2017-10-31T17:44:17Z
No. of bitstreams: 1
Carlos_Alexandre Silva_dos Santos - 2017.pdf: 27850839 bytes, checksum: c4399fa8396d3b558becbfa67b7dd777 (MD5) / Approved for entry into archive by Marlucy Farias Medeiros (marlucy.farias@unipampa.edu.br) on 2017-10-31T18:24:21Z (GMT) No. of bitstreams: 1
Carlos_Alexandre Silva_dos Santos - 2017.pdf: 27850839 bytes, checksum: c4399fa8396d3b558becbfa67b7dd777 (MD5) / Made available in DSpace on 2017-10-31T18:24:21Z (GMT). No. of bitstreams: 1
Carlos_Alexandre Silva_dos Santos - 2017.pdf: 27850839 bytes, checksum: c4399fa8396d3b558becbfa67b7dd777 (MD5)
Previous issue date: 2017-09-26 / O reconhecimento automático de imagens de marca de gado é uma necessidade para os órgãos governamentais responsáveis por esta atividade. Para auxiliar neste processo, este trabalho propõe uma arquitetura que seja capaz de realizar o reconhecimento automático dessas marcas. Nesse sentido, uma arquitetura foi implementada e experimentos foram realizados com dois métodos: Bag-of-Features e Redes Neurais Convolucionais (CNN). No método Bag-of-Features foi utilizado o algoritmo SURF para extração de pontos de interesse das imagens e para criação do agrupa mento de palavras visuais foi utilizado o clustering K-means. O método Bag-of-Features apresentou acurácia geral de 86,02% e tempo de processamento de 56,705 segundos para um conjunto de 12 marcas e 540 imagens. No método CNN foi criada uma rede completa com 5 camadas convolucionais e 3 camadas totalmente conectadas. A 1 ª camada convolucional teve como entrada imagens transformadas para o formato de cores RGB. Para ativação da CNN foi utilizada a função ReLU, e a técnica de maxpooling para redução. O método CNN apresentou acurácia geral de 93,28% e tempo de processamento de 12,716 segundos para um conjunto de 12 marcas e 540 imagens. O método CNN consiste de seis etapas: a) selecionar o banco de imagens; b) selecionar o modelo de CNN pré-treinado; c) pré-processar as imagens e aplicar a CNN; d) extrair as características das imagens; e) treinar e classificar as imagens utilizando SVM; f) avaliar os resultados da classificação. Os experimentos foram realizados utilizando o conjunto de imagens de marcas de gado de uma prefeitura municipal. Para avaliação do desempenho da arquitetura proposta foram utilizadas as métricas de acurácia geral, recall, precisão, coeficiente Kappa e tempo de processamento. Os resultados obtidos foram satisfatórios, nos quais o método CNN apresentou os melhores resultados em comparação ao método Bag-of-Features, sendo 7,26% mais preciso e 43,989 segundos mais rápido. Também foram realizados experimentos com o método CNN em conjuntos de marcas com número maior de amostras, o qual obteve taxas de acurácia geral de 94,90% para 12 marcas e 840 imagens, e 80,57% para 500 marcas e 22.500 imagens, respectivamente. / The automatic recognition of cattle branding is a necessity for government agencies responsible for this activity. In order to improve this process, this work proposes an architecture which is able of performing the automatic recognition of these brandings. The proposed software implements two methods, namely: Bag-of-Features and CNN. For the Bag-of-Features method, the SURF algorithm was used in order to extract points of interest from the images. We also used K-means clustering to create the visual word cluster. The Bag-of-Features method presented a overall accuracy of 86.02% and a processing time of 56.705 seconds in a set containing 12 brandings and 540 images. For the CNN method, we created a complete network with five convolutional layers, and three layers fully connected. For the 1st convolutional layer we converted the input images into the RGB color for mat. In order to activate the CNN, we performed an application of the ReLU, and used the maxpooling technique for the reduction. The CNN method presented 93.28% of overall accuracy and a processing time of 12.716 seconds for a set containing 12 brandings and 540 images. The CNN method includes six steps: a) selecting the image database; b) selecting the pre-trained CNN model; c) pre-processing the images and applying the CNN; d) extracting the features from the images; e) training and classifying the images using SVM; f) assessing the classification results. The experiments were performed using the cattle branding image set of a City Hall. Metrics of overall accuracy, recall, precision, Kappa coefficient, and processing time were used in order to assess the performance of the proposed architecture. Results were satisfactory. The CNN method showed the best results when compared to Bag-of-Features method, considering that it was 7.26% more accurate and 43.989 seconds faster. Also, some experiments were conducted with the CNN method for sets of brandings with a greater number of samples. These larger sets presented a overall accuracy rate of 94.90% for 12 brandings and 840 images, and 80.57% for 500 brandings and 22,500 images, respectively.
|
Page generated in 0.0932 seconds