1 |
Detección de objetos usando redes neuronales convolucionales junto con Random Forest y Support Vector MachinesCampanini García, Diego Alejandro January 2018 (has links)
Ingeniero Civil Eléctrico / En el presente trabajo de título se desarrolla un sistema de detección de objetos (localización y clasificación), basado en redes neuronales convolucionales (CNN por su sigla en inglés) y dos métodos clásicos de machine learning como Random Forest (RF) y Support Vector Machines (SVMs). La idea es mejorar, con los mencionados clasificadores, el rendimiento del sistema de detección conocido como Faster R-CNN (su significado en inglés es: Regions with CNN features).
El sistema Faster R-CNN, se fundamenta en el concepto de region proposal para generar muestras candidatas a ser objetos y posteriormente producir dos salidas: una con la regresión que caracteriza la localización de los objetos y otra con los puntajes de confianza asociados a los bounding boxes predichos. Ambas salidas son generadas por capas completamente conectadas. En este trabajo se interviene la salida que genera los puntajes de confianza, tal que, en este punto se conecta un clasificador (RF o SVM), para generar con estos los puntajes de salida del sistema. De esta forma se busca mejorar el rendimiento del sistema Faster R-CNN.
El entrenamiento de los clasificadores se realiza con los vectores de características extraídos, desde una de las capas completamente conectadas del sistema Faster R-CNN, específicamente se prueban las tres que contempla la arquitectura, para evaluar cuál de estas permite obtener los mejores resultados. Para definir, entre otras cosas, el número de capas convolucionales a utilizar y el tamaño de los filtros presentes en las primeras capas del sistema Faster R-CNN, se emplean los modelos de redes convolucionales ZF y VGG16, estas redes son solamente de clasificación, y son las mismas ocupados originalmente.
Para desarrollar los sistemas propuestos se utilizan distintas implementaciones o librerías para las cuales se dispone de su código de forma abierta. Para el detector Faster R-CNN se utiliza una implementación desarrollado en Python, para RF se comparan dos librerías: randomForest escrita en R y scikit-learn en Python. Por su parte para SVM se utiliza la librería conocida como LIBSVM escrita en C. Las principales tareas de programación consisten en desarrollar los algoritmos de etiquetado de los vectores de características extraídos desde las capas completamente conectadas; unir los clasificadores con el sistema base, para el análisis \textit{online} de las imágenes en la etapa de prueba; programar un algoritmo para el entrenamiento eficiente en tiempo y en memoria para SVM (algoritmo conocido como hard negative mining)
Al evaluar los sistemas desarrollados se concluye que los mejores resultados se obtienen con la red VGG16, específicamente para el caso en que se implementa el sistema Faster R-CNN+SVM con kernel RBF (radial basis function), logrando un mean Average Precision (mAP) de 68.9%. El segundo mejor resultado se alcanza con Faster R-CNN+RF con 180 árboles y es de 67.8%. Con el sistema original Faster R-CNN se consigue un mAP de 69.3%.
|
2 |
Faster R-CNN based CubeSat Close Proximity Detection and Attitude EstimationSujeewa Samarawickrama, N G I 09 August 2019 (has links)
Automatic detection of space objects in optical images is important to close proximity operations, relative navigation, and situational awareness. To better protect space assets, it is very important not only to know where a space object is, but also what the object is. In this dissertation, a method for detecting multiple 1U, 2U, 3U, and 6U CubeSats based on the faster region-based convolutional neural network (Faster R-CNN) is described. CubeSats detection models are developed using Web-searched and computer-aided design images. In addition, a two-step method is presented for detecting a rotating CubeSat in close proximity from a sequence of images without the use of intrinsic or external camera parameters. First, a Faster R-CNN trained on synthetic images of 1U, 2U, 3U, and 6U CubeSats locates the CubeSat in each image and assigns a weight to each CubeSat class. Then, these classification results are combined using Dempster's rule. The method is tested on simulated scenarios where the rotating 3U and 6U CubeSats are in unfavorable views or in dark environments. Faster R-CNN detection results contain useful information for tracking, navigation, pose estimation, and simultaneous localization and mapping. A coarse single-point attitude estimation method is proposed utilizing the centroids of the bounding boxes surrounding the CubeSats in the image. The centroids define the line-of-sight (LOS) vectors to the detected CubeSats in the camera frame, and the LOS vectors in the reference frame are assumed to be obtained from global positioning system (GPS). The three-axis attitude is determined from the vector observations by solving Wahba's problem. The attitude estimation concept is tested on simulated scenarios using Autodesk Maya.
|
3 |
USING ADVANCED DEEP LEARNING TECHNIQUES TO IDENTIFY DRAINAGE CROSSING FEATURESEdidem, Michael Isaiah 01 August 2024 (has links) (PDF)
High-resolution digital elevation models (HRDEMs) enable precise mapping of hydrographic features. However, the absence of drainage crossings underpassing roads or bridges hinders accurate delineation of stream networks. Traditional methods such as on-screen digitization and field surveys for locating these crossings are time-consuming and expensive for extensive areas. This study investigates the effectiveness of deep learning models for automated drainage crossing detection using HRDEMs. The study also explores the performance of advanced classification algorithm such as EfficientNetV2 model using various co-registered HRDRM-derived geomorphological features, such as positive openness, geometric curvature, and topographic position index (TPI) variants, for drainage crossings classification. The results reveal that individual layers, particularly HRDEM and TPI21, achieve the best performance, while combining all five layers doesn't improve accuracy. Hence, effective feature screening is crucial, as eliminating less informative features enhances the F1 score. For drainage crossing detection, this study develops and trains deep learning models, Faster R-CNN and YOLOv5 object detectors, using HRDEM tiles and ground truth labels. These models achieve an average F1-score of 0.78 in Nebraska watershed and demonstrate successful transferability to other watersheds. This spatial object detection approach offers a promising avenue for automated, large-scale drainage crossing detection, facilitating the integration of these features into HRDEMs and improving the accuracy of hydrographic network delineation.
|
4 |
Image-Text context relation using Machine Learning : Research on performance of different datasetsSun, Yuqi January 2022 (has links)
Based on the progress in Computer Vision and Natural Language Processing fields, Vision-Language (VL) models are designed to process information from images and texts. The thesis focused on the performance of a model, Oscar, on different datasets. Oscar is a State-of-The-Art VL representation learning model based on a pre-trained model for Object Detection and a pre-trained Bert model. By comparing the performance of datasets, we could understand the relationship between the properties of datasets and the performance of models. The conclusions could provide the direction for future work on VL datasets and models. In this thesis, I collected five VL datasets that have at least one main difference from each other and generated 8 subsets from these datasets. I trained the same model with different subsets to classify whether an image is related to a text. In common sense, clear datasets have better performance because their images are of everyday scenes and annotated by human annotators. Thus, the size of clear datasets is always limited. However, an interesting phenomenon in the thesis is that the dataset generated by models trained on different datasets has achieved as good performance as clear datasets. This would encourage the research on models for data collection. The experiment results also indicated that future work on the VL model could focus on improving feature extraction from images, as the images have a great influence on the performance of VL models. / Baserat på prestationerna inom Computer Vision och Natural Language Processing-fält, är Vision-Language (VL)-modeller utformade för att bearbeta information från bilder och texter. Projektet fokuserade på prestanda av en modell, Oscar, på olika datamängder. Oscar är en State-of-The-Art VL-representationsinlärningsmodell baserad på en förutbildad modell för Objektdetektion och en förutbildad Bert-modell. Genom att jämföra datauppsättningarnas prestanda kunde vi förstå sambandet mellan datauppsättningarnas egenskaper och modellernas prestanda. Slutsatserna skulle kunna ge riktning för framtida arbete med VL-datauppsättningar och modeller. I detta projekt samlade jag fem VL-datauppsättningar som har minst en huvudskillnad från varandra och genererade 8 delmängder från dessa datauppsättningar. Jag tränade samma modell med olika delmängder för att klassificera om en bild är relaterad till en text. I sunt förnuft har tydliga datauppsättningar bättre prestanda eftersom deras bilder är av vardagliga scener och kommenterade av människor. Storleken på tydliga datamängder är därför alltid begränsad. Ett intressant fenomen i projektet är dock att den datauppsättning som genereras av modeller har uppnått lika bra prestanda som tydliga datauppsättningar. Detta skulle uppmuntra forskning om modeller för datainsamling. Experimentresultaten indikerade också att framtida arbete med VL-modellen kan fokusera på att förbättra funktionsextraktion från bilder, eftersom bilderna har ett stort inflytande på prestandan hos VL-modeller.
|
5 |
Layout Analysis on modern Newspapers using the Object Detection model Faster R-CNNFunkquist, Mikaela January 2022 (has links)
As society is becoming more and more digitized the amount of digital data is increasing rapidly. Newspapers are one example of this, that many Libraries around the world are storing as digital images. This enables a great opportunity for research on Newspapers, and a particular research area is Document Layout Analysis where one divides the document into different segments and classifies them. In this thesis modern Newspaper pages, provided by KBLab, were used to investigate how well a Deep Learning model developed for General Object Detection performs in this area. In particular the Faster R-CNN Object detection model was trained on manually annotated newspaper pages from two different Swedish publishers, namely Dagens Nyheter and Aftonbladet. All newspaper pages were taken from editions published between 2010 and 2020, meaning only modern newspapers were considered. The methodology in this thesis involved sampling editions from the given publishers and time periods and then manually annotating these by marking out the desired layout elements with bounding boxes. The classes considered were: headlines, subheadlines, decks, charts/infographics, photographs, pull quotes, cartoons, fact boxes, bylines/credits, captions, tableaus and tables. Given the annotated data, a Faster R-CNN with a ResNet-50-FPN backbone was trained on both the Dagens Nyheter and Aftonbladet train sets and then evaluated on different test set. Results such as a mAP0.5:0.95 of 0.6 were achieved for all classes, while class-wise evaluation indicate precisions around 0.8 for some classes such as tableaus, decks and photographs. / I takt med att samhället blir mer och mer digitaliserat ökar mängden digital data snabbt. Tidningar är ett exempel på detta, som många bibliotek runt om i världen lagrar som digitala bilder. Detta möjliggör en stor möjlighet för forskning på tidningar, och ett särskilt forskningsområde är Dokument Layout Analys där man delar in dokumentet i olika segment och klassificerar dem. I denna avhandling användes moderna tidningssidor, tillhandahållna av KBLab, för att undersöka hur väl en djupinlärnings-modell utvecklad för generell Objektdetektering presterar inom detta område. Mer precist, tränades en Faster R-CNN Objektdetekteringsmodell på manuellt annoterade tidningssidor från två olika svenska förlag, nämligen Dagens Nyheter och Aftonbladet. Alla tidningssidor togs från utgåvor som publicerats mellan 2010 och 2020, vilket innebär att endast moderna tidningar behandlades. Metodiken i detta examensarbete innebar att först göra ett urval av utgåvor från givna förlag och tidsperioder och sedan manuellt annotera dessa genom att markera ut önskade layoutelement med begränsningsrutor. Klasserna som användes var: rubriker, underrubriker, ingress, diagram/infografik, fotografier, citat, tecknade serier, faktarutor, författares signatur, bildtexter, tablåer och tabeller. Givet den annoterade datan, tränades en Faster R-CNN med en ResNet-50-FPN ryggrad på både Dagens Nyheter och Aftonbladet träningsdatan och sedan utvärderades dem på olika testset. Resultat som mAP0.5:0.95 på 0.6 uppnåddes för alla klasser, medan klassvis utvärdering indikerar precision kring 0.8 för vissa klasser som tablåer, ingresser och fotografier.
|
6 |
[en] METHOD FOR AUTOMATIC DETECTION OF STAMPS IN SCANNED DOCUMENTS USING DEEP LEARNING AND SYNTHETIC DATA GENERATION BY INSTANCE AUGMENTATION / [pt] MÉTODO PARA DETECÇÃO AUTOMÁTICA DE CARIMBOS EM DOCUMENTOS ESCANEADOS USANDO DEEP LEARNING E GERAÇÃO DE DADOS SINTÉTICOS ATRAVÉS DE INSTANCE AUGMENTATIONTHALES LEVI AZEVEDO VALENTE 11 August 2022 (has links)
[pt] Documentos digitalizados em ambientes de negócios substituíram grandes
volumes de papéis. Profissionais autorizados usam carimbos para certificar
informações críticas nesses documentos. Muitas empresas precisam verificar o
carimbo adequado de documentos de entrada e saída. Na maioria das situações de
inspeção, as pessoas realizam inspeção visual para identificar carimbos. Assim
sendo, a verificação manual de carimbos é cansativa, suscetível a erros e ineficiente
em termos de tempo gasto e resultados esperados. Erros na verificação manual de
carimbos podem gerar multas de órgãos reguladores, interrupção de operações e até
mesmo comprometer fluxos de trabalho e transações financeiras. Este trabalho
propõe dois métodos que combinados podem resolver esse problema,
automatizando totalmente a detecção de carimbos em documentos digitalizados do
mundo real. Os métodos desenvolvidos podem lidar com conjuntos de dados
contendo muitos tipos de carimbos de tamanho de amostra pequena, com múltiplas
sobreposições, combinações diferentes por página e dados ausentes. O primeiro
método propõe uma arquitetura de rede profunda projetada a partir da relação entre
os problemas identificados em carimbos do mundo real e os desafios e soluções da
tarefa de detecção de objetos apontados na literatura. O segundo método propõe um
novo pipeline de aumento de instâncias de conjuntos de dados de carimbos a partir
de dados reais e investiga se é possível detectar tipos de carimbos com amostras
insuficientes. Este trabalho avalia os hiperparâmetros da abordagem de aumento de
instâncias e os resultados obtidos usando um método Deep Explainability. Foram
alcançados resultados de última geração para a tarefa de detecção de carimbos
combinando com sucesso esses dois métodos, alcançando 97.3 por cento de precisão e
93.2 por cento de recall. / [en] Scanned documents in business environments have replaced large volumes
of papers. Authorized professionals use stamps to certify critical information in
these documents. Many companies need to verify the adequate stamping of
incoming and outgoing documents. In most inspection situations, people perform a
visual inspection to identify stamps. Therefore, manual stamp checking is tiring,
susceptible to errors, and inefficient in terms of time spent and expected results.
Errors in manual checking for stamps can lead to fines from regulatory bodies,
interruption of operations, and even compromise workflows and financial
transactions. This work proposes two methods that combined can address this
problem, by fully automating stamp detection in real-world scanned documents.
The developed methods can handle datasets containing many small sample-sized
types of stamps, multiples overlaps, different combinations per page, and missing
data. The first method proposes a deep network architecture designed from the
relationship between the problems identified in real-world stamps and the
challenges and solutions of the object detection task pointed out in the literature.
The second method proposes a novel instance augmentation pipeline of stamp
datasets from real data to investigate whether it is possible to detect stamp types
with insufficient samples. We evaluate the hyperparameters of the instance
augmentation approach and the obtained results through a Deep Explainability
method. We achieve state-of-the-art results for the stamp detection task by
successfully combining these two methods, achieving 97.3 percent of precision and
93.2 percent of recall.
|
7 |
Convolutional Neural Networks for Named Entity Recognition in Images of Documentsvan de Kerkhof, Jan January 2016 (has links)
This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance.
|
8 |
ERROR DETECTION IN PRODUCTION LINES VIA DEPENDABLE ARCHITECTURES IN CONVOLUTIONAL NEURAL NETWORKSOlsson, Erik January 2023 (has links)
The need for products has increased during the last few years, this high demand needs to bemet with higher means of production. The use of neural networks can be the key to increasedproduction without having to compromise product quality or human workers well being. This thesislooks into the concept of reliable architectures in convolutional neural networks and how they canbe implemented. The neural networks are trained to recognize the features in images to identifycertain objects, these recognition is then compared to other models to see which of them had the bestprediction. Using multiple models creates a reliable architecture from which results can be produced,these results can then be used in combinations with algorithms to improve prediction certainty. Theaim of implementing the networks with these algorithms are to improve the results without havingto change the networks configurations.
|
9 |
Defect Detection and OCR on SteelGrönlund, Jakob, Johansson, Angelina January 2019 (has links)
In large scale productions of metal sheets, it is important to maintain an effective way to continuously inspect the products passing through the production line. The inspection mainly consists of detection of defects and tracking of ID numbers. This thesis investigates the possibilities to create an automatic inspection system by evaluating different machine learning algorithms for defect detection and optical character recognition (OCR) on metal sheet data. Digit recognition and defect detection are solved separately, where the former compares the object detection algorithm Faster R-CNN and the classical machine learning algorithm NCGF, and the latter is based on unsupervised learning using a convolutional autoencoder (CAE). The advantage of the feature extraction method is that it only needs a couple of samples to be able to classify new digits, which is desirable in this case due to the lack of training data. Faster R-CNN, on the other hand, needs much more training data to solve the same problem. NCGF does however fail to classify noisy images and images of metal sheets containing an alloy, while Faster R-CNN seems to be a more promising solution with a final mean average precision of 98.59%. The CAE approach for defect detection showed promising result. The algorithm learned how to only reconstruct images without defects, resulting in reconstruction errors whenever a defect appears. The errors are initially classified using a basic thresholding approach, resulting in a 98.9% accuracy. However, this classifier requires supervised learning, which is why the clustering algorithm Gaussian mixture model (GMM) is investigated as well. The result shows that it should be possible to use GMM, but that it requires a lot of GPU resources to use it in an end-to-end solution with a CAE.
|
10 |
Detekce a klasifikace dopravních prostředků v obraze pomocí hlubokých neuronových sítí / Detection and Classification of Road Users in Aerial Imagery Based on Deep Neural NetworksHlavoň, David January 2018 (has links)
This master's thesis deals with a vehicle detector based on the convolutional neural network and scene captured by drone. Dataset is described at the beginning, because the main aim of this thesis is to create practicly usable detector. Architectures of the forward neural networks which detector was created from are described in the next chapter. Techniques for building a detector based on the naive methods and current the most successful meta architectures follow the neural network architectures. An implementation of the detector is described in the second part of this thesis. The final detector was built on meta architecture Faster R-CNN and PVA neural network on which the detector achieved score over 90 % and 45 full HD frames per seconds.
|
Page generated in 0.0589 seconds