1 |
Region-based Convolutional Neural Network and Implementation of the Network Through Zedboard ZynqIslam, Md Mahmudul 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In autonomous driving, medical diagnosis, unmanned vehicles and many other
new technologies, the neural network and computer vision has become extremely
popular and influential. In particular, for classifying objects, convolutional neural
networks (CNN) is very efficient and accurate. One version is the Region-based CNN
(RCNN). This is our selected network design for a new implementation in an FPGA.
This network identifies stop signs in an image.
We successfully designed and trained an RCNN network in MATLAB and implemented it in the hardware to use in an embedded real-world application. The hardware implementation has been achieved with maximum FPGA utilization of 220
18k BRAMS, 92 DSP48Es, 8156 FFS, 11010 LUTs with an on-chip power consumption of 2.235 Watts. The execution speed in FPGA is 0.31 ms vs. the MATLAB
execution of 153 ms (on the computer) and 46 ms (on GPU).
|
2 |
POTHOLE DETECTION USING DEEP LEARNING AND AREA ASSESSMENT USING IMAGE MANIPULATIONKharel, Subash 01 June 2021 (has links)
Every year, drivers are spending over 3 billions to repair damage on vehicle caused by potholes. Along with the financial disaster, potholes cause frustration in drivers. Also, with the emerging development of automated vehicles, road safety with automation in mind is being a necessity. Deep Learning techniques offer intelligent alternatives to reduce the loss caused by spotting pothole. The world is connected in such a way that the information can be shared in no time. Using the power of connectivity, we can communicate the information of potholes to other vehicles and also the department of Transportation for necessary action. A significant number of research efforts have been done with a view to help detect potholes in the pavements. In this thesis, we have compared two object detection algorithms belonging to two major classes i.e. single shot detectors and two stage detectors using our dataset. Comparing the results in the Faster RCNN and YOLOv5, we concluded that, potholes take a small portion in image which makes potholes detection with YOLOv5 less accurate than the Faster RCNN, but keeping the speed of detection in mind, we have suggested that YOLOv5 will be a better solution for this task. Using the YOLOv5 model and image processing technique, we calculated approximate area of potholes and visualized the shape of potholes. Thus obtained information can be used by the Department of Transportation for planning necessary construction tasks. Also, we can use these information to warn the drivers about the severity of potholes depending upon the shape and area.
|
3 |
Instance Segmentation on depth images using Swin Transformer for improved accuracy on indoor images / Instans-segmentering på bilder med djupinformation för förbättrad prestanda på inomhusbilderHagberg, Alfred, Musse, Mustaf Abdullahi January 2022 (has links)
The Simultaneous Localisation And Mapping (SLAM) problem is an open fundamental problem in autonomous mobile robotics. One of the latest most researched techniques used to enhance the SLAM methods is instance segmentation. In this thesis, we implement an instance segmentation system using Swin Transformer combined with two of the state of the art methods of instance segmentation namely Cascade Mask RCNN and Mask RCNN. Instance segmentation is a technique that simultaneously solves the problem of object detection and semantic segmentation. We show that depth information enhances the average precision (AP) by approximately 7%. We also show that the Swin Transformer backbone model can work well with depth images. Our results also show that Cascade Mask RCNN outperforms Mask RCNN. However, the results are to be considered due to the small size of the NYU-depth v2 dataset. Most of the instance segmentation researches use the COCO dataset which has a hundred times more images than the NYU-depth v2 dataset but it does not have the depth information of the image.
|
4 |
Region-based Convolutional Neural Network and Implementation of the Network Through Zedboard ZynqMD MAHMUDUL ISLAM (6372773) 10 June 2019 (has links)
<div>In autonomous driving, medical diagnosis, unmanned vehicles and many other new technologies, the neural network and computer vision has become extremely popular and influential. In particular, for classifying objects, convolutional neural networks (CNN) is very efficient and accurate. One version is the Region-based CNN (RCNN). This is our selected network design for a new implementation in an FPGA.</div><div><br></div><div>This network identies stop signs in an image. We successfully designed and trained an RCNN network in MATLAB and implemented it in the hardware to use in an embedded real-world application. The hardware implementation has been achieved with maximum FPGA utilization of 220 18k_BRAMS, 92 DSP48Es, 8156 FFS, 11010 LUTs with an on-chip power consumption of 2.235 Watts. The execution speed in FPGA is 0.31 ms vs. the MATLAB execution of 153 ms (on computer) and 46 ms (on GPU).</div>
|
5 |
Dataset quality assessment through camera analysis : Predicting deviations in plant productionSadashiv, Aravind January 2022 (has links)
Different type of images provided by various combinations of cameras have the power to help increase and optimize plant growth. Along with a powerful deep learning model, for the purpose of detecting these stress indicators in RGB images, can significantly increase the harvest yield. The field of AI solutions in agriculture is not vastly explored and this thesis aims to take a first step in helping explore different techniques to detect early plant stress. Within this work, different types and combinations of camera modules will initially be reviewed and evaluated based on the amount of information they provide. Using the chosen cameras, we manually set up datasets and annotations, chose and then trained a suitable and appropriate algorithm to predict deviations from an ideal state in plant production. The algorithm chosen was Faster RCNN, which resulted in having a very high detection accuracy. Along with the main type of cameras, a new particular type of images analysis, named SI-NDVI, is done using a particular combination of the main three cameras and the results show that it is able to detect vegetation and able to predict or show if a plant is stressed or not. An in-depth research is done on all these techniques to create a good quality dataset for the purpose of early stress detection. / Olika typer av bilder försedda av olika kombinationer av kameror har kapaciteten att hjälpa öka och optimera odling av växter. Tillsammans med en kraftfull deep learning-modell, för att detektera olika stressindikatorer i RGB bilder, kan signifikant öka skördar. Fältet av AI-lösningar inom jordbruk är inte väl utforskat och denna uppsats siktar på att ta ett första steg i utforskandet av olika tekniker för att detektera tidig stress hos växter. Inom detta arbete kommer olika typer och kombinationer av kameramoduler bli utvärderade baserat på hur mycket information de kan förse. Genom att använda de valda kamerorna skapar vi själva dataseten och kategoriserar dem, därefter välja och träna en lämplig algoritm för att förutspå förändringar från ett idealt tillstånd för växtens tillväxt. Algoritmen som valdes var Faster RCNN, vilken hade en väldigt hög träffsäkerhet. Parallellt med de huvudsakliga kamerorna genomförs en ny typ av bildanalys vid namn SI-NDVI genom användandet av en särskild kombination av de tre kameror och resultat visar att det är möjligt att detektera vegetation och förutspå eller visa om en växt är stressad eller inte. En fördjupad undersökning genomförs på alla dessa tekniker för att skapa ett dataset av god kvalité för att kunna förutspå tidig stress.
|
6 |
Analýza rozložení textu v historických dokumentech / Text Layout Analysis in Historical DocumentsPalacková, Bianca January 2021 (has links)
The goal of this thesis is to design and implement algorithm for text layout analysis in historical documents. Neural network was used to solve this problem, specifically architecture Faster-RCNN. Dataset of 6 135 images with historical newspaper was used for training and testing. For purpose of the thesis four models of neural networks were trained: model for detection of words, headings, text regions and model for words detection based on position in line. Outputs from these models were processed in order to determine text layout in input image. A modified F-score metric was used for the evaluation. Based on this metric, the algorithm reached an accuracy almost 80 %.
|
7 |
A Novel Approach for Rice Plant Disease Detection, classification and localization using Deep Learning TechniquesVadrevu, Surya S V A S Sudheer January 2023 (has links)
Background. This Thesis addresses the critical issue of disease management in ricecrops, a key factor in ensuring both food security and the livelihoods of farmers. Objectives. The primary focus of this research is to tackle the often-overlooked challenge of precise disease localization within rice plants by harnessing the power of deep learning techniques. The primary goal is not only to classify diseases accurately but also to pinpoint their exact locations, a vital aspect of effective disease management. The research encompasses early disease detection, classification, andthe precise identification of disease locations, all of which are crucial components of a comprehensive disease management strategy. Methods. To establish the reliability of the proposed model, a rigorous validation process is conducted using standardized datasets of rice plant diseases. Two fundamental research questions guide this study: (1) Can deep learning effectively achieve early disease detection, accurate disease classification, and precise localizationof rice plant diseases, especially in scenarios involving multiple diseases? (2) Which deep learning architecture demonstrates the highest level of accuracy in both disease diagnosis and localization? The performance of the model is evaluated through the application of three deep learning architectures: Masked RCNN, YOLO V8, and SegFormer. Results. These models are assessed based on their training and validation accuracy and loss, with specific metrics as follows: For Masked RCNN, the model achieves a training accuracy of 91.25% and a validation accuracy of 87.80%, with corresponding training and validation losses of 0.3215 and 0.4426. YOLO V8 demonstrates a training accuracy of 85.50% and a validation accuracy of 80.20%, with training andvalidation losses of 0.4212 and 0.5623, respectively. SegFormer shows a training accuracy of 78.75% and a validation accuracy of 75.30%, with training and validation losses of 0.5678 and 0.6741, respectively. Conclusions. This research significantly contributes to the field of agricultural disease management, offering valuable insights that have the potential to enhance crop yield, food security, and the overall well-being of farmers
|
8 |
Improving Situational Awareness in Aviation: Robust Vision-Based Detection of Hazardous ObjectsLevin, Alexandra, Vidimlic, Najda January 2020 (has links)
Enhanced vision and object detection could be useful in the aviation domain in situations of bad weather or cluttered environments. In particular, enhanced vision and object detection could improve situational awareness and aid the pilot in environment interpretation and detection of hazardous objects. The fundamental concept of object detection is to interpret what objects are present in an image with the aid of a prediction model or other feature extraction techniques. Constructing a comprehensive data set that can describe the operational environment and be robust for weather and lighting conditions is vital if the object detector is to be utilised in the avionics domain. Evaluating the accuracy and robustness of the constructed data set is crucial. Since erroneous detection, referring to the object detection algorithm failing to detect a potentially hazardous object or falsely detecting an object, is a major safety issue. Bayesian uncertainty estimations are evaluated to examine if they can be utilised to detect miss-classifications, enabling the use of a Bayesian Neural Network with the object detector to identify an erroneous detection. The object detector Faster RCNN with ResNet-50-FPN was utilised using the development framework Detectron2; the accuracy of the object detection algorithm was evaluated based on obtained MS-COCO metrics. The setup achieved a 50.327 % AP@[IoU=.5:.95] score. With an 18.1 % decrease when exposed to weather and lighting conditions. By inducing artificial artefacts and augmentations of luminance, motion, and weather to the images of the training set, the AP@[IoU=.5:.95] score increased by 15.6 %. The inducement improved the robustness necessary to maintain the accuracy when exposed to variations of environmental conditions, which resulted in just a 2.6 % decrease from the initial accuracy. To fully conclude that the augmentations provide the necessary robustness for variations in environmental conditions, the model needs to be subjected to actual image representations of the operational environment with different weather and lighting phenomena. Bayesian uncertainty estimations show great promise in providing additional information to interpret objects in the operational environment correctly. Further research is needed to conclude if uncertainty estimations can provide necessary information to detect erroneous predictions.
|
9 |
Deep Learning Models for Human Activity RecognitionAlbert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
|
Page generated in 0.0301 seconds