• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 3
  • 1
  • 1
  • Tagged with
  • 57
  • 33
  • 32
  • 29
  • 27
  • 23
  • 19
  • 18
  • 17
  • 16
  • 16
  • 16
  • 14
  • 13
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Object Tracking in Games Using Convolutional Neural Networks

Venkatesh, Anirudh 01 June 2018 (has links) (PDF)
Computer vision research has been growing rapidly over the last decade. Recent advancements in the field have been widely used in staple products across various industries. The automotive and medical industries have even pushed cars and equipment into production that use computer vision. However, there seems to be a lack of computer vision research in the game industry. With the advent of e-sports, competitive and casual gaming have reached new heights with regard to players, viewers, and content creators. This has allowed for avenues of research that did not exist prior. In this thesis, we explore the practicality of object detection as applied in games. We designed a custom convolutional neural network detection model, SmashNet. The model was improved through classification weights generated from pre-training on the Caltech101 dataset with an accuracy of 62.29%. It was then trained on 2296 annotated frames from the competitive 2.5-dimensional fighting game Super Smash Brothers Melee to track coordinate locations of 4 specific characters in real-time. The detection model performs at a 68.25% accuracy across all 4 characters. In addition, as a demonstration of a practical application, we designed KirbyBot, a black-box adaptive bot which performs basic commands reactively based only on the tracked locations of two characters. It also collects very simple data on player habits. KirbyBot runs at a rate of 6-10 fps. Object detection has several practical applications with regard to games, ranging from better AI design, to collecting data on player habits or game characters for competitive purposes or improvement updates.
22

Automatic Semantic Segmentation of Indoor Datasets

Rachakonda, Sai Swaroop January 2024 (has links)
Background: In recent years, computer vision has undergone significant advancements, revolutionizing fields such as robotics, augmented reality, and autonomoussystems. Key to this transformation is Simultaneous Localization and Mapping(SLAM), a fundamental technology that allows machines to navigate and interactintelligently with their surroundings. Challenges persist in harmonizing spatial andsemantic understanding, as conventional methods often treat these tasks separately,limiting comprehensive evaluations with shared datasets. As applications continueto evolve, the demand for accurate and efficient image segmentation ground truthbecomes paramount. Manual annotation, a traditional approach, proves to be bothcostly and resource-intensive, hindering the scalability of computer vision systems.This thesis addresses the urgent need for a cost-effective and scalable solution byfocusing on the creation of accurate and efficient image segmentation ground truth,bridging the gap between spatial and semantic tasks. Objective: This thesis addresses the challenge of creating an efficient image segmentation ground truth to complement datasets with spatial ground truth. Theprimary objective is to reduce the time and effort taken for annotation of datasets. Method: Our methodology adopts a systematic approach to evaluate and combineexisting annotation techniques, focusing on precise object detection and robust segmentation. By merging these approaches, we aim to enhance annotation accuracywhile streamlining the annotation process. This approach is systematically appliedand evaluated across multiple datasets, including the NYU V2 dataset(consists ofover 1449 images), ARID(real-world sequential dataset), and Italian flats(sequentialdataset created in blender). Results: The developed pipeline demonstrates promising outcomes, showcasing asubstantial reduction in annotation time compared to manual annotation, thereby addressing the challenges posed by the cost and resource intensiveness of the traditionalapproach. We observe that although not initially optimized for SLAM datasets, thepipeline performs exceptionally well on both ARID and Italian flats datasets, highlighting its adaptability to real-world scenarios. Conclusion: In conclusion, this research introduces an innovative annotation pipeline,offering a systematic and efficient approach to annotation. It tries to bridge the gapbetween spatial and semantic tasks, addressing the pressing need for comprehensiveannotation tools in this domain.
23

ENHANCING PRECISION OF OBJECT DETECTORS: BRIDGING CLASSIFICATION AND LOCALIZATION GAPS FOR 2D AND 3D MODELS

NIRANJAN RAVI (7013471) 03 June 2024 (has links)
<p dir="ltr">Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation.</p><p dir="ltr">This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. </p><p dir="ltr">To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap.</p><p dir="ltr">In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets.</p><p dir="ltr">Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders 14 (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.</p>
24

Crowd Counting Camera Array and Correction

Fausak, Andrew Todd 05 1900 (has links)
"Crowd counting" is a term used to describe the process of calculating the number of people in a given context; however, crowd counting has multiple challenges especially when images representing a given crowd span multiple cameras or images. In this thesis, we propose a crowd counting camera array and correction (CCCAC) method using a camera array of scaled, adjusted, geometrically corrected, combined, processed, and then corrected images to determine the number of people within the newly created combined crowd field. The purpose of CCCAC is to transform and combine valid regions from multiple images from different sources and order as a uniform proportioned set of images for a collage or discrete summation through a new precision counting architecture. Determining counts in this manner within normalized view (collage), results in superior counting accuracy than processing individual images and summing totals with prior models. Finally, the output from the counting model is adjusted with learned results over time to perfect the counting ability of the entire counting system itself. Results show that CCCAC crowd counting corrected and uncorrected methods perform superior to raw image processing methods.
25

Enhancing Athletic Training Through AI: A Comparative Analysis Of YOLO Versions For Image Segmentation In Velocity-Based Training

Ågren, Oscar, Palm, Johan January 2024 (has links)
This work explores the application of Artificial Intelligence (AI) in sports, specifically comparing. You Only Look Once (YOLO) version 8 and version 9 models in the context of Velocity-Based Training and resistance training. It aims to evaluate the models’ performance in instance segmentation and their effectiveness in estimating velocity metrics. Additionally, methods for pixel to meter conversion and centroid selection on barbells are developed and discussed. The field of AI is growing vastly with great practical possibilities in the sports industry. Traditional methods of collecting and analyzing data involving sensors are often expensive and not available to many coaches and athletes. By leveraging AI techniques, this work aims to provide insights to more cost-effective solutions. An experiment was conducted where YOLOv8 and YOLOv9 models of different sizes were trained on a custom dataset. Using the resulting model weights, key Velocity-based Training (VBT) metrics were extracted from videos of squat, bench press and deadlift exercises, and compared with sensor data. To automatically track the barbell in the videos, the centroids of bounding boxes were used. Additionally, to acquire the velocity in meters per second, pixel-to-meter conversion ratios were obtained using the Circular Hough Transform. Findings indicate that the YOLOv8x model generally excels according to performance metrics, however recording high mean inference time. Additionally, the YOLOv8m model showed overestimation in mean velocity, peak velocity and range of motion highlighting potential challenges for real-time VBT applications. Otherwise, all models performed very similar to sensor data, occasionally differing in scale stemming from faulty pixel to meter conversions. In conclusion, this work underscores AI’s potential in the sports industry while identifying areas for further enhancement to ensure accuracy and reliability in applications.
26

Optical Inspection for Soldering Fault Detection in a PCB Assembly using Convolutional Neural Networks

Bilal Akhtar, Muhammad January 2019 (has links)
Convolutional Neural Network (CNN) has been established as a powerful toolto automate various computer vision tasks without requiring any aprioriknowledge. Printed Circuit Board (PCB) manufacturers want to improve theirproduct quality by employing vision based automatic optical inspection (AOI)systems at PCB assembly manufacturing. An AOI system employs classiccomputer vision and image processing techniques to detect variousmanufacturing faults in a PCB assembly. Recently, CNN has been usedsuccessfully at various stages of automatic optical inspection. However, nonehas used 2D image of PCB assembly directly as input to a CNN. Currently, allavailable systems are specific to a PCB assembly and require a lot ofpreprocessing steps or a complex illumination system to improve theaccuracy. This master thesis attempts to design an effective soldering faultdetection system using CNN applied on image of a PCB assembly, withRaspberry Pi PCB assembly as the case in point.Soldering faults detection is considered as equivalent of object detectionprocess. YOLO (short for: “You Only Look Once”) is state-of-the-art fast objectdetection CNN. Although, it is designed for object detection in images frompublicly available datasets, we are using YOLO as a benchmark to define theperformance metrics for the proposed CNN. Besides accuracy, theeffectiveness of a trained CNN also depends on memory requirements andinference time. Accuracy of a CNN increases by adding a convolutional layer atthe expense of increased memory requirement and inference time. Theprediction layer of proposed CNN is inspired by the YOLO algorithm while thefeature extraction layer is customized to our application and is a combinationof classical CNN components with residual connection, inception module andbottleneck layer.Experimental results show that state-of-the-art object detection algorithmsare not efficient when used on a new and different dataset for object detection.Our proposed CNN detection algorithm predicts more accurately than YOLOalgorithm with an increase in average precision of 3.0%, is less complexrequiring 50% lesser number of parameters, and infers in half the time takenby YOLO. The experimental results also show that CNN can be an effectivemean of performing AOI (given there is plenty of dataset available for trainingthe CNN). / Convolutional Neural Network (CNN) har etablerats som ett kraftfullt verktygför att automatisera olika datorvisionsuppgifter utan att kräva någon apriorikunskap. Printed Circuit Board (PCB) tillverkare vill förbättra sinproduktkvalitet genom att använda visionbaserade automatiska optiskainspektionssystem (AOI) vid PCB-monteringstillverkning. Ett AOI-systemanvänder klassiska datorvisions- och bildbehandlingstekniker för att upptäckaolika tillverkningsfel i en PCB-enhet. Nyligen har CNN använts framgångsrikti olika stadier av automatisk optisk inspektion. Ingen har dock använt 2D-bildav PCB-enheten direkt som inmatning till ett CNN. För närvarande är allatillgängliga system specifika för en PCB-enhet och kräver mångaförbehandlingssteg eller ett komplext belysningssystem för att förbättranoggrannheten. Detta examensarbete försöker konstruera ett effektivtlödningsfelsdetekteringssystem med hjälp av CNN applicerat på bild av enPCB-enhet, med Raspberry Pi PCB-enhet som fallet.Detektering av lödningsfel anses vara ekvivalent medobjektdetekteringsprocessen. YOLO (förkortning: “Du ser bara en gång”) ärdet senaste snabba objektdetekteringen CNN. Även om det är utformat förobjektdetektering i bilder från offentligt tillgängliga datasätt, använder viYOLO som ett riktmärke för att definiera prestandametriken för detföreslagna CNN. Förutom noggrannhet beror effektiviteten hos en tränadCNN också på minneskrav och slutningstid. En CNNs noggrannhet ökargenom att lägga till ett invändigt lager på bekostnad av ökat minnesbehov ochinferingstid. Förutsägelseskiktet för föreslaget CNN är inspirerat av YOLOalgoritmenmedan funktionsekstraktionsskiktet anpassas efter vår applikationoch är en kombination av klassiska CNN-komponenter med restanslutning,startmodul och flaskhalsskikt.Experimentella resultat visar att modernaste objektdetekteringsalgoritmerinte är effektiva när de används i ett nytt och annorlunda datasätt förobjektdetektering. Vår föreslagna CNN-detekteringsalgoritm förutsäger merexakt än YOLO-algoritmen med en ökning av den genomsnittliga precisionenpå 3,0%, är mindre komplicerad som kräver 50% mindre antal parametraroch lägger ut under halva tiden som YOLO tar. De experimentella resultatenvisar också att CNN kan vara ett effektivt medel för att utföra AOI (med tankepå att det finns gott om datamängder tillgängliga för utbildning av CNN)
27

Automated Image Pre-Processing for Optimized Text Extraction Using Reinforcement Learning and Genetic Algorithms

Rohoullah, Rahmat, Joakim, Månsson January 2023 (has links)
This project aims to develop an automated image pre-processing chain to extract valuable information from appliance labels before recycling. The primary goal is to improve optical character recognition accuracy by addressing noise issues using reinforcement learning and an evolutionary algorithm. Python was selected as the primary programming language for this project due to its extensive support for machine learning and computer vision libraries. Different techniques are implemented to enhance text extraction from labels. Binary Robust Invariant Scalable Keypoints (BRISK) are used to straighten labels and separate the label from the background. You Only Look Once version 8x (YOLOv8x) is then used for extracting the regions containing the text of interest. The reinforcement learning model and genetic algorithm dataset are created using BRISK with YOLOv8x. The results showed that pre-processing images in the dataset, provided through BRISK and YOLOv8x, does not affect text extraction accuracy, as suggested by reinforcement learning and evolutionary algorithms. / Detta projekt syftar till att utveckla en automatiserad bildförbehandlingskedja för att extrahera värdefull information från apparatmärken före återvinning. Det primära målet är att förbättra noggrannheten för optisk teckenigenkänning genom att hantera brusproblem med hjälp av förstärkningsinlärning och en evolutionär algoritm. Python valdes som det primära programmeringsspråket för detta projekt på grund av dess omfattande stöd för maskininlärnings- och datorseendebibliotek. Olika tekniker implementeras för att förbättra textutvinningen från etiketterna. Binary Robust Invariant Scalable Keypoints (BRISK) används för att räta ut etiketter och separera etiketten från bakgrunden. You Only Look Once version 8x (YOLOv8x) används sedan för att extrahera områden som innehåller den önskade texten. Datasetet för förstärkningsinlärningsmodellen och den genetiska algoritmen skapas genom att använda BRISK med YOLOv8x. Resultaten visade att förbehandlingen av bilder i datasetet, som tillhandahålls genom BRISK och YOLOv8x, inte påverkar noggrannheten för textutvinning, som föreslagits av förstärkningsinlärning och evolutionära algoritmer.
28

Convolutional neural network based object detection in a fish ladder : Positional and class imbalance problems using YOLOv3 / Objektdetektering i en fisktrappa baserat på convolutional neural networks : Positionell och kategorisk obalans vid användning av YOLOv3

Ekman, Patrik January 2021 (has links)
Hydropower plants create blockages in fish migration routes. Fish ladders can serve as alternative routes but are complex to install and follow up to help adapt and develop them further. In this study, computer vision tools are considered in this regard. More specifically, object detection is applied to images collected in a hydropower plant fish ladder to localise and classify wild, farmed and unknown fish labelled according to the presence, absence or uncertainty of an adipose fin. Fish migration patterns are not deterministic, making it a challenge to collect representative and balanced data to train a model that is resilient to changing conditions. In this study, two data imbalances are addressed by modifying a YOLOv3 baseline model: foreground-foreground class imbalance is targeted using hard and soft resampling and positional imbalance using translation augmentation. YOLOv3 is a convolutional neural network predicting bounding box coordinates, class probabilities and confidence scores simultaneously. It divides images into grids and makes predictions based on grid cell locations and anchor box offsets. Performance is estimated across 10 random data splits and different bounding box overlap thresholds, using (mean) average precision as well as recall, precision and F1 score estimated at optimal validation set confidence thresholds. The Wilcoxon signed-ranks test is used for determining statistical significance. In experiments, the best performance was observed on wild and farmed fish, with F1 scores reaching 94.8 and 89.0 percent respectively. The inconsistent appearance of unknown fish appears harder to generalise to, with a corresponding F1 score of 65.7 percent. Soft sampling but especially translation augmentation contributed to enhanced performance and reduced variance, implying that the baseline model is particularly sensitive to positional imbalance. Spatial dependencies introduced by YOLOv3’s grid cell strategy likely produce local bias or overfitting. An experimental evaluation highlight the importance of not relying on a single data split when evaluating performance on a moderately large or custom dataset. A key challenge observed in experiments is the choice of a suitable confidence threshold, influencing the dynamics of the results. / Vattenkraftverk blockerar fiskars vandringsvägar. Fisktrappor kan skapa alternativa vägar men är komplexa att installera och följa upp för vidare anpassning och utveckling. I denna studie betraktas datorseende i detta avseende. Mer specifikt appliceras objektdetektering på bilder samlade i en fisktrappa i anslutning till ett vattenkraftverk, med målet att lokalisera och klassificera vilda, odlade och okända fiskar baserat på förekomsten, avsaknaden eller osäkerheten av en fett-fena. Fiskars migrationsmönster är inte deterministiska vilket gör det svårt att samla representativ och balanserad data för att trana en modell som kan hantera förändrade förutsättningar. I denna studie addresseras två obalanser i datan genom modifikation av en YOLOv3 baslinjemodell: klass-obalans genom hård och mjuk återanvändning av data och positionell obalans genom translation av bilder innan träning. YOLOv3 är ett convolutional neural network som simultant förutsäger avgränsnings-lådor, klass-sannolikheter och prediktions-säkerhet. Bilder delas upp i rutnätceller och prediktioner görs baserat på cellers position samt modifikation av fördefinierade avgränsningslådor. Resultat beräknas på 10 slumpmässiga uppdelningar av datan och för olika tröskelvärden för avgränsningslådors överlappning. På detta beräknas (mean) average precision, liksom recall, precision och F1 score med tröskelvärden för prediktions-säkerhet beräknat på valideringsdata. Wilcoxon signed-ranks test används för att avgöra statistisk signifikans. Bäst resultat observeras på vilda och odlade fiskar, med F1 scores som når 94.8 respektive 89.0 procent. Okända fiskars inkonsekventa utseenden verkar svårare att generalisera till, med en motsvarande F1 score på 65.7 procent. Mjuk återanvändning av data men speciellt translation bidrar till förbättrad prestanda och minskad varians, vilket pekar på att baslinjemodellen är särskilt känslig för positionell obalans. Spatiala beroenden skapade av YOLOv3s rutnäts-strategi producerar troligen lokal partiskhet eller överträning. I en experimentell utvärdering understryks vikten av multipel uppdelning av datan vid evaluering på ett måttligt stort eller egenskapat dataset. Att välja tröskelvärdet för prediktions-säkerhet anses utmanande och påverkar resultatens dynamik.
29

Image Augmentation to Create Lower Quality Images for Training a YOLOv4 Object Detection Model

Melcherson, Tim January 2020 (has links)
Research in the Arctic is of ever growing importance, and modern technology is used in news ways to map and understand this very complex region and how it is effected by climate change. Here, animals and vegetation are tightly coupled with their environment in a fragile ecosystem, and when the environment undergo rapid changes it risks damaging these ecosystems severely.  Understanding what kind of data that has potential to be used in artificial intelligence, can be of importance as many research stations have data archives from decades of work in the Arctic. In this thesis, a YOLOv4 object detection model has been trained on two classes of images to investigate the performance impacts of disturbances in the training data set. An expanded data set was created by augmenting the initial data to contain various disturbances. A model was successfully trained on the augmented data set and a correlation between worse performance and presence of noise was detected, but changes in saturation and altered colour levels seemed to have less impact than expected. Reducing noise in gathered data is seemingly of greater importance than enhancing images with lacking colour levels. Further investigations with a larger and more thoroughly processed data set is required to gain a clearer picture of the impact of the various disturbances.
30

LOW COST DATA ACQUISITION FOR AUTONOMOUS VEHICLE

Dong Hun Lee (9040400) 29 June 2020 (has links)
The study of this research has a challenge of learning data gathering sensor programming and design of electronic sensor circuit. The cost of autonomous vehicle development is expensive compared to purchasing an economy vehicle such as the Hyundai Elantra. Keeping the development cost down is critical to maintaining a competitive edge on vehicle pricing with newer technologies. Autonomous vehicle sensor integration was designed and then tested for the driving vision data-gathering system that requires the system to gather driving vision data utilizing area scan sensors, Lidar, ultrasonic sensor, and camera on real road scenarios. The project utilized sensors such as cheap cost LIDAR, which is that drone is used for on the road testing; other sensors include myRIO (myRIO Hardware), LabVIEW (LabVIEW software), LIDAR-Lite v3 (Garmin, 2019), Ultrasonic sensor, and Wantai stepper motor (Polifka, 2020). This research helps to reduce the price of usage of autonomous vehicle driving systems in the city. Due to resolution and Lidar detecting distance, the test environment is limited to within city areas. Lidar is the most expensive equipment on autonomous vehicle driving data gathering systems. This study focuses on replacing expensive Lidar, ultrasonic sensor, and camera to drone scale low-cost Lidar to real size vehicle. With this study, economic expense autonomous vehicle driving data acquisition is possible. Lowering the price of autonomous vehicle driving data acquisition increases involving new companies on the autonomous vehicle market. Multiple testing with multiple cars is possible. Since multiple testing at the same time is possible, collecting time reduces.

Page generated in 0.03 seconds