Global ETD Search

201	Object Tracking in Games Using Convolutional Neural Networks Venkatesh, Anirudh 01 June 2018 (has links) (PDF) Computer vision research has been growing rapidly over the last decade. Recent advancements in the field have been widely used in staple products across various industries. The automotive and medical industries have even pushed cars and equipment into production that use computer vision. However, there seems to be a lack of computer vision research in the game industry. With the advent of e-sports, competitive and casual gaming have reached new heights with regard to players, viewers, and content creators. This has allowed for avenues of research that did not exist prior. In this thesis, we explore the practicality of object detection as applied in games. We designed a custom convolutional neural network detection model, SmashNet. The model was improved through classification weights generated from pre-training on the Caltech101 dataset with an accuracy of 62.29%. It was then trained on 2296 annotated frames from the competitive 2.5-dimensional fighting game Super Smash Brothers Melee to track coordinate locations of 4 specific characters in real-time. The detection model performs at a 68.25% accuracy across all 4 characters. In addition, as a demonstration of a practical application, we designed KirbyBot, a black-box adaptive bot which performs basic commands reactively based only on the tracked locations of two characters. It also collects very simple data on player habits. KirbyBot runs at a rate of 6-10 fps. Object detection has several practical applications with regard to games, ranging from better AI design, to collecting data on player habits or game characters for competitive purposes or improvement updates. Convolutional Neural Networks YOLO Games CNNs Neural Networks Object Detection Computer Engineering
202	Automating Deep-Sea Video Annotation Egbert, Hanson 01 June 2021 (has links) (PDF) As the world explores opportunities to develop offshore renewable energy capacity, there will be a growing need for pre-construction biological surveys and post-construction monitoring in the challenging marine environment. Underwater video is a powerful tool to facilitate such surveys, but the interpretation of the imagery is costly and time-consuming. Emerging technologies have improved automated analysis of underwater video, but these technologies are not yet accurate or accessible enough for widespread adoption in the scientific community or industries that might benefit from these tools. To address these challenges, prior research developed a website that allows to: (1) Quickly play and annotate underwater videos, (2) Create a short tracking video for each annotation that shows how an annotated concept moves in time, (3) Verify the accuracy of existing annotations and tracking videos, (4) Create a neural network model from existing annotations, and (5) Automatically annotate unwatched videos using a model that was previously created. It uses both validated and unvalidated annotations and automatically generated annotations from trackings to count the number of Rathbunaster californicus (starfish) and Strongylocentrotus fragilis (sea urchin) with count accuracy of 97% and 99%, respectively, and F1 score accuracy of 0.90 and 0.81, respectively. The thesis explores several improvements to the model above. First, a method to sync JavaScript video frames to a stable Python environment. Second, reinforcement training using marine biology experts and the verification feature. Finally, a hierarchical method that allows the model to combine predictions of related concepts. On average, this method improved the F1 scores from 0.42 to 0.45 (a relative increase of 7%) and count accuracy from 58% to 69% (a relative increase of 19%) for the concepts Umbellula Lindahli and Funiculina. Object Detection Hierarchical Classification Biodiversity Monitoring Object Tracking Object Counting Automatic Video Annotation Other Computer Engineering
203	Semi-Automatic ImageAnnotation Tool Alvenkrona, Miranda, Hylander, Tilda January 2023 (has links) Annotation is essential in machine learning. Building an accurate object detec-tion model requires a large, diverse dataset, which poses challenges due to thetime-consuming nature of manual annotation. This thesis was made in collabora-tion with Project Ngulia, which aims at developing technical solutions to protectand monitor wild animals. A contribution of this work was to integrate an effi-cient semi-automatic image annotation tool within the Ngulia system, with theaim of streamlining the annotation process and improving the employed objectdetection models. Through research into available annotation tools, a custom toolwas deemed the most cost-effective and flexible option. It utilizes object detec-tion model predictions as annotation suggestions, improving the efficiency of theannotation process. The efficiency was evaluated through a user test, with partic-ipants achieving an average reduction of approximately 2 seconds in annotationspeed when utilizing suggestions. This reduction was supported as statisticallysignificant through a one-way ANOVA test. Additionally, it was investigated which images should be prioritized for an-notation in order to obtain the the most accurate predictions. Different samplingmethods were investigated and compared. The performance of the obtained mod-els remained relatively consistent, although with the even distribution methodat top. This indicate that the choice of sampling method may not substantiallyimpact the accuracy of the model, as the performance of the methods was rela-tively comparable. Moreover, different methods of selecting training data in there-training process was compared. The difference in performance was consider-ately small, likely due to the limited and balanced data pool. The experimentsdid however indicate that incorporating previously seen data with unseen datacould be beneficial, and that a reduced dataset can be sufficient. However, furtherinvestigation is required to fully understand the extent of these benefits. annotation machine learning annotation tool image annotation object detection selective annotation re-training Control Engineering Reglerteknik
204	The research of background removal applied to fashion data : The necessity analysis of background removal for fashion data / Forskningen av bakgrundsborttagning tillämpas på modedata : Nödvändighetsanalysen av bakgrundsborttagning för modedata Liang, Junhui January 2022 (has links) Fashion understanding is a hot topic in computer vision, with many applications having a great business value in the market. It remains a difficult challenge for computer vision due to the immense diversity of garments and a wide range of scenes and backgrounds. In this work, we try to remove the background of fashion images to boost data quality and ultimately increase model performance. Thanks to the fashion image consisting of evident persons in full garments visible, we can utilize Salient Object Detection (SOD) to achieve the background removal of fashion data to our expectations. The fashion image with removing the background is claimed as the “rembg” image, contrasting with the original one in the fashion dataset. We conduct comparative experiments between these two types of images on multiple aspects of model training, including model architectures, model initialization, compatibility with other training tricks and data augmentations, and target task types. Our experiments suggested that background removal can significantly work for fashion data in simple and shallow networks that are not susceptible to overfitting. It can improve model accuracy by up to 5% in the classification of FashionStyle14 when training models from scratch. However, background removal does not perform well in the deep network due to its incompatibility with other regularization techniques like batch normalization, pre-trained initialization, and data augmentations introducing randomness. The loss of background pixels invalidates many existing training tricks in the model training, adding the risk of overfitting for deep models. / Modeförståelse är ett hett ämne inom datorseende, med många applikationer som har ett stort affärsvärde på marknaden. Det är fortfarande en svår utmaning för datorseende på grund av den enorma mångfalden av plagg och ett brett utbud av scener och bakgrunder. I det här arbetet försöker vi ta bort bakgrunden från modebilder för att öka datakvaliteten och i slutändan öka modellens prestanda. Tack vare modebilden som består av synliga personer i helt synliga plagg, kan vi använda framträdande objektivdetektion för att uppnå bakgrundsborttagning av modedata enligt våra förväntningar. Modebilden med att ta bort bakgrunden hävdas vara “rembg”-bilden, i kontrast till den ursprungliga i modedatasetet. Vi genomför jämförande experiment mellan dessa två typer av bilder på flera aspekter av modellträning, inklusive modellarkitekturer, modellinitiering , kompatibilitet med andra träningsknep och dataökningar och måluppgiftstyper. Våra experiment antydde att bakgrundsborttagning avsevärt kan fungera för modedata i enkla och ytliga nätverk som inte är mottagliga för överanpassning. Det kan förbättra modellens noggrannhet med upp till 5 % i klassificeringen av FashionStyle14 när man tränar modeller från grunden. Bakgrundsborttagning fungerar dock inte bra i det djupa nätverket på grund av dess inkompatibilitet med andra regulariseringstekniker som batchnormalisering, förtränad initialisering och dataförstärkningar som introducerar slumpmässighet. Förlusten av bakgrundspixlar ogiltigförklarar många befintliga träningsknep i modellträningen, lägg till risken för övermontering för djupa modeller. Background Removal Fashion Analysis Salient Object Detection Computer and Information Sciences Data- och informationsvetenskap
205	Automation of Closed-Form and Spectral Matting Methods for Intelligent Surveillance Applications Alrabeiah, Muhammad 16 December 2015 (has links) Machine-driven analysis of visual data is the hard core of intelligent surveillance systems. Its main goal is to recognize di erent objects in the video sequence and their behaviour. Such operation is very challenging due to the dynamic nature of the scene and the lack of semantic-comprehension for visual data in machines. The general ow of the recognition process starts with the object extraction task. For so long, this task has been performed using image segmentation. However, recent years have seen the emergence of another contender, image matting. As a well-known process, matting has a very rich literature, most of which is designated to interactive approaches for applications like movie editing. Thus, it was conventionally not considered for visual data analysis operations. Following the new shift toward matting as a means to object extraction, two methods have stood out for their foreground-extraction accuracy and, more importantly, their automation potential. These methods are Closed-Form Matting (CFM) and Spectral Matting (SM). They pose the matting process as either a constrained optimization problem or a segmentation-like component selection process. This di erence of formulation stems from an interesting di erence of perspective on the matting process, opening the door for more automation possibilities. Consequently, both of these methods have been the subject of some automation attempts that produced some intriguing results. For their importance and potential, this thesis will provide detailed discussion and analysis on two of the most successful techniques proposed to automate the CFM and SM methods. In the beginning, focus will be on introducing the theoretical grounds of both matting methods as well as the automatic techniques. Then, it will be shifted toward a full analysis and assessment of the performance and implementation of these automation attempts. To conclude the thesis, a brief discussion on possible improvements will be presented, within which a hybrid technique is proposed to combine the best features of the reviewed two techniques. / Thesis / Master of Applied Science (MASc) Video processing Visual-content analysis Image processing Video matting Object detection image matting
206	Exploration of performance evaluation metrics with deep-learning-based generic object detection for robot guidance systems Gustafsson, Helena January 2023 (has links) Robots are often used within the industry for automated tasks that are too dangerous, complex, or strenuous for humans, which leads to time and cost benefits. Robots can have an arm and a gripper to manipulate the world and sensors for eyes to be able to perceive the world. Human vision can be seen as an effortless task, but machine vision requires substantial computation in an attempt to be as effective as human vision. Visual object recognition is a common goal for machine vision, and it is often applied using deep learning and generic object detection. This thesis has a focus on robot guidance systems that include a robot with its gripper on the robot arm, a camera that acquires images of the world, boxes to detect in one or more layers, and the software that applies a generic object detection model to detect the boxes. Robot guidance systems’ performance is impacted by many variables such as different environmental, camera, object, and robot gripper aspects. A survey was constructed to receive feedback from professionals on what thresholds that can be defined for detection from the model to be counted as correct, with the aspect of the detection referring to an actual object that needs to be able to be picked up by a robot. This thesis has implemented precision, recall, average precision at a specific threshold, average precision at a range of thresholds, localization-recall-precision error, and a manually constructed counter based on survey results for the robot’s ability to pick up an object from the information provided by the detection, called pickability score. The metrics from this thesis are implemented within a tool intended for analyzing different models’ performance on varying datasets. The values of all the metrics for the applied dataset are presented in the results. The metrics are discussed with regards to what information they portray together with a robot guidance system. The conclusion is to see the metrics for what they are best at by themselves. Use the average precision metrics for the performance evaluation of the models, and the pickability scores with extended features for the robot gripper pickability evaluation. performance evaluation robot guidance system deep learning object detection instance segmentation Computer and Information Sciences Data- och informationsvetenskap
207	Smartphone Based Object Detection for Shark Spotting Oliver, Darrick W 01 November 2023 (has links) (PDF) Given concern over shark attacks in coastal regions, the recent use of unmanned aerial vehicles (UAVs), or drones, has increased to ensure the safety of beachgoers. However, much of city officials' process remains manual, with drone operation and review of footage still playing a significant role. In pursuit of a more automated solution, researchers have turned to the usage of neural networks to perform detection of sharks and other marine life. For on-device solutions, this has historically required assembling individual hardware components to form an embedded system to utilize the machine learning model. This means that the camera, neural processing unit, and central processing unit are purchased and assembled separately, requiring specific drivers and involves a lengthy setup process. Addressing these issues, we look to the usage of smartphones as a novel integrated solution for shark detection. This paper looks at using an iPhone 14 Pro as the driving force for a YOLOv5 based model, and comparing our results to previous literature in shark-based object detection. We find that our system outperforms previous methods at both higher throughput and increased accuracy. Smartphone Object Detection Computer Vision Shark iPhone Artificial Intelligence Artificial Intelligence and Robotics Engineering
208	Edge Machine Learning for Wildlife Conservation : A part of the Ngulia project / Maskininlärning i Noden för Bevarandet av Djurlivet på Savannen : En del av Ngulia projektet Gotthard, Richard, Broström, Marcus January 2023 (has links) The prominence of Edge Machine Learning is increasing swiftly as the performance of microcontrollers continues to improve. By deploying object detection and classification models on edge devices with camera sensors, it becomes possible to locate and identify objects in their vicinity. This technology finds valuable applications in wildlife conservation, particularly in camera traps used in African sanctuaries, and specifically in the Ngulia sanctuary, to monitor endangered species and provide early warnings for potential intruders. When an animal crosses the path of a an edge device equipped with a camera sensor, an image is captured, and the animal's presence and identity are subsequently determined. The performance of three distinct object detection models: SSD MobileNetV2, FOMO MobileNetV2, and YOLOv5 is evaluated. Furthermore, the compatibility of these models with three different microcontrollers ESP32 TimerCam from M5Stack, Sony Spresence, and LILYGO T-Camera S3 ESP32-S is explored. The deployment of Over-The-Air updates to edge devices stationed in remote areas is presented. It illustrates how an edge device, initially deployed with a model, can collect field data and be iteratively updated using an active learning pipeline. This project evaluates the performance of three different microcontrollers in conjunction with their respective camera sensors. A contribution of this work is a successful field deployment of a LILYGO T-Camera S3 ESP32-S running the FOMO MobileNetV2 model. The data captured by this setup fuels an active learning pipeline that can be iteratively retrain the FOMO MobileNetV2 model and update the LILYGO T-Camera S3 ESP32-S with new firmware through Over-The-Air updates. / Project Ngulia Edge Machine Learning Object Detection Classification Wildlife Conservation Control Engineering Reglerteknik
209	Accelerating Multi-target Visual Tracking on Smart Edge Devices Nalaie, Keivan January 2023 (has links) \prefacesection{Abstract} Multi-object tracking (MOT) is a key building block in video analytics and finds extensive use in surveillance, search and rescue, and autonomous driving applications. Object detection, a crucial stage in MOT, dominates in the overall tracking inference time due to its reliance on Deep Neural Networks (DNNs). Despite the superior performance of cutting-edge object detectors, their extensive computational demands limit their real-time application on embedded devices that possess constrained processing capabilities. Hence, we aim to reduce the computational burdens of object detection while maintaining tracking performance. As the first approach, we adapt frame resolutions to reduce computational complexity. During inference, frame resolutions can be tuned according to the complexity of visual scenes. We present DeepScale, a model-agnostic frame resolution selection approach that operates on top of existing fully convolutional network-based trackers. By analyzing the effect of frame resolution on detection performance, DeepScale strikes good trade-offs between detection accuracy and processing speed by adapting frame resolutions on-the-fly. Our second approach focuses on enhancing the efficiency of a tracker by model adaptation. We introduce AttTrack to expedite tracking by interleaving the execution of object detectors of different model sizes in inference. A sophisticated network (teacher) runs for keyframes only while, for non-keyframe, knowledge is transferred from the teacher to a smaller network (student) to improve the latter’s performance. Our third contribution involves exploiting temporal-spatial redundancies to enable real-time multi-camera tracking. We propose the MVSparse pipeline which consists of a central processing unit that aggregates information from multiple cameras (on an edge server or in the cloud) and distributed lightweight Reinforcement Learning (RL) agents running on individual cameras that predict the informative blocks in the current frame based on past frames on the same camera and detection results from other cameras. / Thesis / Doctor of Science (PhD) Mutli-object tracking Edge computing Real-time video analytics Efficient deep inference Efficient object detection
210	Real time Optical Character Recognition in steel bars using YOLOV5 Gattupalli, Monica January 2023 (has links) Background.Identifying the quality of the products in the manufacturing industry is a challenging task. Manufacturers use needles to print unique numbers on the products to differentiate between good and bad quality products. However, identi- fying these needle printed characters can be difficult. Hence, new technologies like deep learning and optical character recognition (OCR) are used to identify these characters. Objective.The primary ob jective of this thesis is to identify the needle-printed characters on steel bars. This ob jective is divided into two sub-ob jectives. The first sub-ob jective is to identify the region of interest on the steel bars and extract it from the images. The second sub-ob jective is to identify the characters on the steel bars from the extracted images. The YOLOV5 and YOLOV5-obb ob ject detection algorithms are used to achieve these ob jectives. Method. Literature review was performed at first to select the algorithms, then the research was to collect the dataset, which was provided by OVAKO. The dataset included 1000 old images and 3000 new images of steel bars. To answer the RQ2, at first existing OCR techniques were used on the old images which had low accuracy levels. So, the YOLOV5 algorithm was used on old images to detect the region of interest. Different rotation techniques are applied to the cropped images(cropped after the bounding box is detected) no promising result is observed so YOLOV5 at the character level is used in identifying the characters, the results are unsatisfactory. To achieve this, YOLOV5-obb was used on the new images, which resulted in good accuracy levels. Results. Accuracy and mAP are used to assess the performance of OCRs and selected ob ject detection algorithms. The current study proved Existing OCR was also used in the extraction, however, it had an accuracy of 0%, which implies it failed to identify characters. With a mAP of 0.95, YOLOV5 is good at extracting cropped images but fails to identify the characters. When YOLOV5-obb is used for attaining orientation, it achieves a mAP of 0.93. Due to time constraint, the last part of the thesis was not implemented. Conclusion. The present research employed YOLOV5 and YOLOV5-obb ob ject detection algorithms to identify needle-printed characters on steel bars. By first se- lecting the region of interest and then extracting images, the study ob jectives were met. Finally, character-level identification was performed on the old images using the YOLOV5 technique and on the new images using the YOLOV5-obb algorithm, with promising results Deep learning Object detection Tesseract OCR YOLOV5 YOLOV5- obb Computer Sciences Datavetenskap (datalogi)

Search results