Spelling suggestions: "subject:"image recognition"" "subject:"lmage recognition""
81 |
Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphersMagnifico, Giacomo January 2021 (has links)
Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focus on historical ciphers (i.e. encrypted documents from the past with various types of symbol sets), this thesis examines the performance of two machine learning architectures developed within the DECRYPT project framework, a clustering based unsupervised algorithm and a semi-supervised few-shot deep-learning model. Both models are tested on seen and unseen scribes to evaluate the difference in performance and the shortcomings of the two architectures, with the secondary goal of determining the influences of the datasets on the performance. An in-depth analysis of the transcription results is performed with particular focus on the Alchemic and Zodiac symbol sets, with analysis of the model performance relative to character shape and size. The results show the promising performance of Few-Shot architectures when compared to Clustering algorithm, with a respective SER average of 0.336 (0.15 and 0.104 on seen data / 0.754 on unseen data) and 0.596 (0.638 and 0.350 on seen data / 0.8 on unseen data).
|
82 |
Metody hlubokého učení pro zpracování obrazů / Deep learning methods for image processingKřenek, Jakub January 2017 (has links)
This master‘s thesis deals with the Deep Learning methods for image recognition tasks from the first methods to the modern ones. The main focus is on convolutional neural nets based models for classification, detection and image segmentation. These methods are used for practical implemetation – counting passing cars on video from traffic camera. After several test of available models, the YOLOv2 architecture was chosen and retrained on own dataset. The application also includes the addition of the SORT tracking algorithm.
|
83 |
Vyhledvn zjmovch objekt ve videu / Object Instance Search in VideoIakymets, Bohdan January 2020 (has links)
This work focuses on creating mobile application, that helps visitors of galleries and museums to find, in a more easier way, interesting information about visual art objects.
|
84 |
Mobilní aplikace pro automatický záznam šachové partie / Mobile Application for Automatic Recording of Chess GamesJiruška, Adam January 2020 (has links)
This thesis is focused on making application for mobile devices, which records progress of chess game. This is achieved by image recognition on input from camera. Chess figures are classified by neural network. Usage of application is during training or real matches to record games and then for analyzing these games. For analyzing, my application offers record in standard algebraic notation. User can also add notes to every game.
|
85 |
Sledování osob v záznamu z dronu / Tracking People in Video Captured from a DroneLukáč, Jakub January 2020 (has links)
Práca rieši možnosť zaznamenávať pozíciu osôb v zázname z kamery drona a určovať ich polohu. Absolútna pozícia sledovanej osoby je odvodená vzhľadom k pozícii kamery, teda vzhľadom k umiestneniu drona vybaveného príslušnými senzormi. Zistené dáta sú po ich spracovaní vykreslené ako príslušné cesty. Práca si ďalej dáva za cieľ využiť dostupné riešenia čiastkových problémov: detekcia osôb v obraze, identifikácie jednotlivých osôb v čase, určenie vzdialenosti objektu od kamery, spracovanie potrebných senzorových dát. Následne využiť preskúmané metódy a navrhnúť riešenie, ktoré bude v reálnom čase pracovať na uvedenom probléme. Implementačná časť spočíva vo využití akcelerátoru Intel NCS v spojení s Raspberry Pi priamo ako súčasť drona. Výsledný systém je schopný generovať výstup o polohe osôb v zábere kamery a príslušne ho prezentovať.
|
86 |
Sledování osob ve videu z dronu / Tracking People in Video Captured from a DroneLukáč, Jakub January 2021 (has links)
Práca rieši možnosť zaznamenávať pozíciu osôb v zázname z kamery drona a určovať ich polohu. Absolútna pozícia sledovanej osoby je odvodená vzhľadom k pozícii kamery, teda vzhľadom k umiestneniu drona vybaveného príslušnými senzormi. Zistené dáta sú po ich spracovaní vykreslené ako príslušné cesty v grafe. Práca si ďalej dáva za cieľ využiť dostupné riešenia čiastkových problémov: detekcia osôb v obraze, identifikácia jednotlivých osôb v čase, určenie vzdialenosti objektu od kamery, spracovanie potrebných senzorových dát. Následne využiť preskúmané metódy a navrhnúť riešenie, ktoré bude v reálnom čase pracovať na uvedenom probléme. Implementačná časť spočíva vo využití akcelerátoru Intel NCS v spojení s Raspberry Pi priamo ako súčasť drona. Výsledný systém je schopný generovať výstup o polohe detekovaných osôb v zábere kamery a príslušne ho prezentovať.
|
87 |
Compare Accuracy of Alternative Methods for Sound Classification on Environmental Sounds of Similar CharacteristicsRudberg, Olov January 2022 (has links)
Artificial neural networks have in the last decade been a vital tool in image recognition, signal processing and speech recognition. Because of these networks' ability to be highly flexible, they suit a vast amount of different data. This flexible attribute is very sought for within the field of environmental sound classification. This thesis seeks to investigate if audio from three types of water usage can be distinguished and classified. The usage types investigated are handwashing, showering and WC-flushing. The data originally consisted of sound recordings in WAV format. The recordings were converted into spectrograms, which are visual representations of audio signals. Two neural networks are addressed for this image classification issue, namely a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN). Further, these spectrograms are subject to both image preprocessing using a Sobel filter, a Canny edge detector and a Gabor filter while also being subjected to data augmentation by applying different brightness and zooming alterations. The result showed that the CNN gave superior results compared to the MLP. The image preprocessing techniques did not improve the data and the model performances, neither did augmentation or a combination between them. An important finding was that constructing the convolutional and pooling filters of the CNN into rectangular shapes and using every other filter type horizontally and vertically on the input spectrogram gave superior results. It seemed to capture more information of the spectrograms since spectrograms mainly contain information in a horizontal or vertical direction. This model achieved 91.14% accuracy. The result stemming from this model architecture further contributes to the environmental sound classification community. / <p>Masters thesis approved 20th june 2022.</p>
|
88 |
Automation of forest road inventory using computer vision and machine learning / Automatisering av skogsvägsinventering med hjälp av datorseende och maskininlärningde Flon, Jacques January 2023 (has links)
There are around 300, 000 kilometer of gravel roads throughout the Swedish countryside, used every day by common people and companies. These roads face constant wear due to harsh weather as well as from heavy traffic, and thus, regular maintenance is required to keep up the road standard. A cost effective maintenance requires knowledge of where support is needed and such data is obtained through inventorying. Today, the road inventory is done primarily by hand using manual tools and requiring trained personel. With new tools, this work could be partially automated which could save on cost as well as open up for more complex analysis. This project aims to investigate the possibility of automating road inventory using computer vision and machine learning. Previous works within the field show promising results using deep convolutional networks to detect and classify road anomalies like potholes and cracks on paved roads. With their results in mind, we try to translate the solutions to also work on unpaved forest roads. During the project, we have collected our own dataset containing 3522 labelled images of gravel and forest roads. There are 203 instances of potholes, 614 bare roads and 3099 snow covered roads. These images were used to train an image segmentation model based on the YOLOv8 architecture for 30 epochs. Using transfer learning we took advantage of pretrained weights gained from training on the COCO dataset. The predicted road segmentation results were also used to estimate the width of the road, using the pinhole camera model and inverse projective geometry. The segmentation model reaches a AP50−95 = 0.746 for the road and 0.813 for the snow covered road. The model shows poor detection of potholes with AP50−95 = 0.048. Using the road segmentations to estimate the road width shows that the model can estimate road width with a mean average error of 0.24 m. The results from this project shows that there are already areas where machine learning could assist human operators with inventory work. Even difficult tasks, like estimating the road width of partially covered roads, can be solved with computer vision and machine learning.
|
89 |
Specialization of an Existing Image Recognition Service Using a Neural NetworkErsson, Sara, Dahl, Oskar January 2018 (has links)
To help combat the environmental impacts caused by humans this project is about investigating one way to simplify the waste management process. The idea is to use image recognition to identify what material the recyclable object is made of. A large data set containing labeled images of trash, called Trashnet, was analyzed using Google Cloud Vision. Since this API is not written for material detection specifically, a feed forward neural network was created using Tensorflow and trained with the output from Google Cloud Vision. Thus, the network learned how different word combinations from Google Cloud Vision implicated one of five different materials; glass, plastic, paper, metal and combustible waste. The network checked for 518 unique words in the input and ran them through two hidden layers with a size of 1000 nodes each, before having a one hot output layer. This neural network received an accuracy of around 60%, which beat Google Cloud Vision’s meager accuracy of around 30%. An application, with which the user can take pictures of the object he or she would like to recycle, could be developed with an educational purpose to let its user know what material the waste is made of, and with this information be able to throw the waste in the right bin. / För att hjälpa till att motverka människans negativa påverkan på miljön kommer detta projekt handla om att undersöka hur man kan göra det enklare att källsortera. Grundidén är att använda bildigenkänning för att identifiera vilket återvinningsbart material som objektet i bilden består av. Ett stort dataset med bilder indelade i olika återvinningsbara material, kallat Trashnet, analyserades med hjälp av Google Cloud Vision, vilket är ett API för bildigenkänning och inte specifikt igenkänning av material. Med hjälp av Tensorflow skapades ett neuralt nätverk som använder utdatan från Google Cloud Vision som indata, vilket i sin tur kan ge ett av fem olika material som utdata; glas, plast, papper, metall eller brännbart. Nätverket lärde sig hur olika ordkombinationer från Google Cloud Vision implikerade ett av de fem materialen. Nätverkets indata-lager består av de 518 unika orden som Google Cloud Vision sammanlagt gav som utdata efter att ha analyserade Trashnets dataset. Dessa ord körs igenom två dolda lager, vilka båda består av 1000 noder var, innan det sista lagret, som är ett ”one hot”-utdatalager. Detta nätverk fick en träffsäkerhet på cirka 60%, vilket slog Google Cloud Visions träffsäkerhet på cirka 30%. Detta skulle kunna användas i en applikation, där användaren tar en bild på det skräp som önskas återvinnas, som utvecklas i utbildningssyfte att lära användaren vilket material dennes återvinningsbara föremål är gjort av, och med denna information bättre kunna källsortera.
|
90 |
Interpretation of Swedish Sign Language using Convolutional Neural Networks and Transfer LearningHalvardsson, Gustaf, Peterson, Johanna January 2020 (has links)
The automatic interpretation of signs of a sign language involves image recognition. An appropriate approach for this task is to use Deep Learning, and in particular, Convolutional Neural Networks. This method typically needs large amounts of data to be able to perform well. Transfer learning could be a feasible approach to achieve high accuracy despite using a small data set. The hypothesis of this thesis is to test if transfer learning works well to interpret the hand alphabet of the Swedish Sign Language. The goal of the project is to implement a model that can interpret signs, as well as to build a user-friendly web application for this purpose. The final testing accuracy of the model is 85%. Since this accuracy is comparable to those received in other studies, the project’s hypothesis is shown to be supported. The final network is based on the pre-trained model InceptionV3 with five frozen layers, and the optimization algorithm mini-batch gradient descent with a batch size of 32, and a step-size factor of 1.2. Transfer learning is used, however, not to the extent that the network became too specialized in the pre-trained model and its data. The network has shown to be unbiased for diverse testing data sets. Suggestions for future work include integrating dynamic signing data to interpret words and sentences, evaluating the method on another sign language’s hand alphabet, and integrate dynamic interpretation in the web application for several letters or words to be interpreted after each other. In the long run, this research could benefit deaf people who have access to technology and enhance good health, quality education, decent work, and reduced inequalities. / Automatisk tolkning av tecken i ett teckenspråk involverar bildigenkänning. Ett ändamålsenligt tillvägagångsätt för denna uppgift är att använda djupinlärning, och mer specifikt, Convolutional Neural Networks. Denna metod behöver generellt stora mängder data för att prestera väl. Därför kan transfer learning vara en rimlig metod för att nå en hög precision trots liten mängd data. Avhandlingens hypotes är att utvärdera om transfer learning fungerar för att tolka det svenska teckenspråkets handalfabet. Målet med projektet är att implementera en modell som kan tolka tecken, samt att bygga en användarvänlig webapplikation för detta syfte. Modellen lyckas klassificera 85% av testinstanserna korrekt. Då denna precision är jämförbar med de från andra studier, tyder det på att projektets hypotes är korrekt. Det slutgiltiga nätverket baseras på den förtränade modellen InceptionV3 med fem frysta lager, samt optimiseringsalgoritmen mini-batch gradient descent med en batchstorlek på 32 och en stegfaktor på 1,2. Transfer learning användes, men däremot inte till den nivå så att nätverket blev för specialiserat på den förtränade modellen och dess data. Nätverket har visat sig vara ickepartiskt för det mångfaldiga testningsdatasetet. Förslag på framtida arbeten inkluderar att integrera dynamisk teckendata för att kunna tolka ord och meningar, evaluera metoden på andra teckenspråkshandalfabet, samt att integrera dynamisk tolkning i webapplikationen så flera bokstäver eller ord kan tolkas efter varandra. I det långa loppet kan denna studie gagna döva personer som har tillgång till teknik, och därmed öka chanserna för god hälsa, kvalitetsundervisning, anständigt arbete och minskade ojämlikheter.
|
Page generated in 0.0744 seconds