Spelling suggestions: "subject:"convolutional neural networks""
1 |
Detekce pohybujících se objektů ve videu s využitím neuronových sítí / Object detection in video using neural networksMikulský, Petr January 2021 (has links)
This diploma thesis deals with the detection of moving objects in a video recording using neural networks. The aim of the thesis was to detect road users in video recordings. Pre-trained YOLOv5 object detection model was used for a practical part of the thesis. As part of the solution, an own dataset of traffic road video recordings was created and annotated with following classes: a car, a bus, a van, a motorcycle, a truck and a trailer truck. Final version of this dataset comprise 5404 frames and 6467 annotated objects in total. After training, the YOLOv5 model achieved 0.995 mAP, 0.995 precision and 0.986 recall on the dataset. All steps leading to the final form of the dataset are described in the conclusion chapter.
|
2 |
Self-supervised učení v aplikacích počítačového vidění / Self-supervised learning in computer vision applicationsVančo, Timotej January 2021 (has links)
The aim of the diploma thesis is to make research of the self-supervised learning in computer vision applications, then to choose a suitable test task with an extensive data set, apply self-supervised methods and evaluate. The theoretical part of the work is focused on the description of methods in computer vision, a detailed description of neural and convolution networks and an extensive explanation and division of self-supervised methods. Conclusion of the theoretical part is devoted to practical applications of the Self-supervised methods in practice. The practical part of the diploma thesis deals with the description of the creation of code for working with datasets and the application of the SSL methods Rotation, SimCLR, MoCo and BYOL in the role of classification and semantic segmentation. Each application of the method is explained in detail and evaluated for various parameters on the large STL10 dataset. Subsequently, the success of the methods is evaluated for different datasets and the limiting conditions in the classification task are named. The practical part concludes with the application of SSL methods for pre-training the encoder in the application of semantic segmentation with the Cityscapes dataset.
|
3 |
The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNetRaptis, Konstantinos 28 November 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Action recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses on movies, web videos, and TV shows datasets. The nature of these datasets make action recognition very challenging due to scene variability and complexity, namely background clutter, occlusions, viewpoint changes, fast irregular motion, and large spatio-temporal search space (articulation configurations and motions). The use of local space and time image features shows promising results, avoiding the cumbersome and often inaccurate frame-by-frame segmentation (boundary estimation). We focus on two state of the art methods for the action classification problem: dense trajectories and recurrent neural networks (RNN). Dense trajectories use typical supervised training (e.g., with Support Vector Machines) of features such as 3D-SIFT, extended SURF, HOG3D, and local trinary patterns; the main idea is to densely sample these features in each frame and track them in the sequence based on optical flow. On the other hand, the deep neural network uses the input frames to detect action and produce part proposals, i.e., estimate information on body parts (shapes and locations). We compare qualitatively and numerically these two approaches, indicative to what is used today, and describe our conclusions with respect to accuracy and efficiency.
|
4 |
RMNv2: Reduced Mobilenet V2 an Efficient Lightweight Model for Hardware DeploymentAyi, Maneesh 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Humans can visually see things and can differentiate objects easily but for computers, it is not that easy. Computer Vision is an interdisciplinary field that allows computers to comprehend, from digital videos and images, and differentiate objects. With the Introduction to CNNs/DNNs, computer vision is tremendously used in applications like ADAS, robotics and autonomous systems, etc. This thesis aims to propose an architecture, RMNv2, that is well suited for computer vision applications such as ADAS, etc.
RMNv2 is inspired by its original architecture Mobilenet V2. It is a modified version of Mobilenet V2. It includes changes like disabling downsample layers, Heterogeneous kernel-based convolutions, mish activation, and auto augmentation. The proposed model is trained from scratch in the CIFAR10 dataset and produced an accuracy of 92.4% with a total number of parameters of 1.06M. The results indicate that the proposed model has a model size of 4.3MB which is like a 52.2% decrease from its original implementation. Due to its less size and competitive accuracy the proposed model can be easily deployed in resource-constrained devices like mobile and embedded devices for applications like ADAS etc. Further, the proposed model is also implemented in real-time embedded devices like NXP Bluebox 2.0 and NXP i.MX RT1060 for image classification tasks.
|
5 |
Electroencephalography and biomechanics of the basketball throwPhan, Phong Ky 08 December 2023 (has links) (PDF)
According to various studies, compared with novice athletes, experts exhibit superior integration of perceptual, cognitive, and motor skills. This superior ability has been associated with the focused and efficient organization of task-related neural networks. Specifically, skilled individuals demonstrate a spatially localized or relatively lower response in brain activity, characterized as ‘neural efficiency’, when performing within their domain of expertise. Previous works also suggested that elite basketball players can predict successful free throws more rapidly and accurately based on cues from body kinematics. These traits are the result of a prolonged training of specific motor skills and focused excitability of the motor cortex during the reaction, movement planning, and execution phases. However, due to motion artifacts occurring during movement initiation and execution, the knowledge about the underlying mechanism and the connection between brain neural networks and body musculoskeletal systems is still limited. Thus, the objective of this study is to utilize electroencephalography (EEG) and motion capture systems (MoCap) to advance and expand the current understanding of the relationships between neurophysiological activities and human biomechanics as well as their effects on the success rate of the motor skills.
The project focuses on fulfilling three specific aims. The first aim focused on the integration of the EEG and the MoCap systems to analyze and compare the successful and unsuccessful outcomes of basketball throws. Then, the second aim utilized Convolution Neural Networks (CNNs) as an alternative approach to predict the shot’s outcome based on the recorded EEG signals and biomechanical parameters. Lastly, the third aim identified the impact of each EEG channel and MoCap parameter on the CNN models using ablation methods. The results obtained from this study can be a practical approach in analyzing the sources that lead to better elite athletes’ performance in various sport-related tasks. Moreover, the acquired data can contribute to a deeper understanding of the vital biomechanical and neurological factors that directly affect the performance of elite athletes during successful outcomes, thus, providing vital information for the overall improvement of athletic performance and guidance for sport-specific training needs.
|
6 |
Development of Surrogate Model for FEM Error Prediction using Deep LearningJain, Siddharth 07 July 2022 (has links)
This research is a proof-of-concept study to develop a surrogate model, using deep learning (DL), to predict solution error for a given model with a given mesh. For this research, we have taken the von Mises stress contours and have predicted two different types of error indicators contours, namely (i) von Mises error indicator (MISESERI), and (ii) energy density error indicator (ENDENERI). Error indicators are designed to identify the solution domain areas where the gradient has not been properly captured. It uses the spatial gradient distribution of the existing solution for a given mesh to estimate the error. Due to poor meshing and nature of the finite element method, these error indicators are leveraged to study and reduce errors in the finite element solution using an adaptive remeshing scheme. Adaptive re-meshing is an iterative and computationally expensive process to reduce the error computed during the post-processing step. To overcome this limitation we propose an approach to replace it using data-driven techniques. We have introduced an image processing-based surrogate model designed to solve an image-to-image regression problem using convolutional neural networks (CNN) that takes a 256 × 256 colored image of von mises stress contour and outputs the required error indicator. To train this model with good generalization performance we have developed four different geometries for each of the three case studies: (i) quarter plate with a hole, (b) simply supported plate with multiple holes, and (c) simply supported stiffened plate. The entire research is implemented in a three phase approach, phase I involves the design and development of a CNN to perform training on stress contour images with their corresponding von Mises stress values volume-averaged over the entire domain. Phase II involves developing a surrogate model to perform image-to-image regression and the final phase III involves extending the capabilities of phase II and making the surrogate model more generalized and robust. The final surrogate model used to train the global dataset of 12,000 images consists of three auto encoders, one encoder-decoder assembly, and two multi-output regression neural networks. With the error of less than 1% in the neural network training shows good memorization and generalization performance. Our final surrogate model takes 15.5 hours to train and less than a minute to predict the error indicators on testing datasets. Thus, this present study can be considered a good first step toward developing an adaptive remeshing scheme using deep neural networks. / Master of Science / This research is a proof-of-concept study to develop an image processing-based neural network (NN) model to solve an image-to-image regression problem. In finite element analysis (FEA), due to poor meshing and nature of the finite element method, these error indicators are used to study and reduce errors. For this research, we have predicted two different types of error indicator contours by using stress images as inputs to the NN model. In popular FEA packages, adaptive remeshing scheme is used to optimize mesh quality by iteratively computing error indicators making the process computationally expensive. To overcome this limitation we propose an approach to replace it using convolutional neural networks (CNN). Such neural networks are particularly used for image based data. To train our CNN model with good generalization performance we have developed four different geometries with varying load cases. The entire research is implemented in a three phase approach, phase I involves the design and development of a CNN model to perform initial level training on small image size. Phase II involves developing an assembled neural network to perform image-to-image regression and the final phase III involves extending the capabilities of phase II for more generalized and robust results. With the error of less than 1% in the neural network training shows good memorization and generalization performance. Our final surrogate model takes 15.5 hours to train and less than a minute to predict the error indicators on testing datasets. Thus, this present study can be considered a good first step toward developing an adaptive remeshing scheme using deep neural networks.
|
7 |
Nízko-dimenzionální faktorizace pro "End-To-End" řečové systémy / Low-Dimensional Matrix Factorization in End-To-End Speech Recognition SystemsGajdár, Matúš January 2020 (has links)
The project covers automatic speech recognition with neural network training using low-dimensional matrix factorization. We are describing time delay neural networks with factorization (TDNN-F) and without it (TDNN) in Pytorch language. We are comparing the implementation between Pytorch and Kaldi toolkit, where we achieve similar results during experiments with various network architectures. The last chapter describes the impact of a low-dimensional matrix factorization on End-to-End speech recognition systems and also a modification of the system with TDNN(-F) networks. Using specific network settings, we were able to achieve better results with systems using factorization. Additionally, we reduced the complexity of training by decreasing network parameters with the use of TDNN(-F) networks.
|
8 |
Odstraňování šumu v obraze pomocí metod hlubokého učení / Removing noise in images using deep learning methodsStrejček, Jakub January 2021 (has links)
This thesis focuses on comparing methods of denoising by deep learning and their implementation. In the last few years, it has become clear that it is not necessary to have paired data, as for noisy and clean pictures, to train convolution neural networks but it is sufficient to have only noisy pictures for denoising in particular cases. By using methods described in this thesis it is possible to effectively remove i.e. additive Gaussian noise and what more, it is possible to achieve better results than by using statistic methods, which are being used for denoising these days.
|
9 |
HBONEXT: AN EFFICIENT DNN FOR LIGHT EDGE EMBEDDED DEVICESSanket Ramesh Joshi (10716561) 10 May 2021 (has links)
<div>Every year the most effective Deep learning models, CNN architectures are showcased based on their compatibility and performance on the embedded edge hardware, especially for applications like image classification. These deep learning models necessitate a significant amount of computation and memory, so they can only be used on high-performance computing systems like CPUs or GPUs. However, they often struggle to fulfill portable specifications due to resource, energy, and real-time constraints. Hardware accelerators have recently been designed to provide the computational resources that AI and machine learning tools need. These edge accelerators have high-performance hardware which helps maintain the precision needed to accomplish this mission. Furthermore, this classification dilemma that investigates channel interdependencies using either depth-wise or group-wise convolutional features, has benefited from the inclusion of Bottleneck modules. Because of its increasing use in portable applications, the classic inverted residual block, a well-known architecture technique, has gotten more recognition. This work takes it a step forward by introducing a design method for porting CNNs to low-resource embedded systems, essentially bridging the difference between deep learning models and embedded edge systems. To achieve these goals, we use closer computing strategies to reduce the computer's computational load and memory usage while retaining excellent deployment efficiency. This thesis work introduces HBONext, a mutated version of Harmonious Bottlenecks (DHbneck) combined with a Flipped version of Inverted Residual (FIR), which outperforms the current HBONet architecture in terms of accuracy and model size miniaturization. Unlike the current definition of inverted residual, this FIR block performs identity mapping and spatial transformation at its higher dimensions. The HBO solution, on the other hand, focuses on two orthogonal dimensions: spatial (H/W) contraction-expansion and later channel (C) expansion-contraction, which are both organized in a bilaterally symmetric manner. HBONext is one of those versions that was designed specifically for embedded and mobile applications. In this research work, we also show how to use NXP Bluebox 2.0 to build a real-time HBONext image classifier. The integration of the model into this hardware has been a big hit owing to the limited model size of 3 MB. The model was trained and validated using CIFAR10 dataset, which performed exceptionally well due to its smaller size and higher accuracy. The validation accuracy of the baseline HBONet architecture is 80.97%, and the model is 22 MB in size. The proposed architecture HBONext variants, on the other hand, gave a higher validation accuracy of 89.70% and a model size of 3.00 MB measured using the number of parameters. The performance metrics of HBONext architecture and its various variants are compared in the following chapters.</div>
|
10 |
Zlepšování kvality digitalizovaných textových dokumentů / Document Quality EnhancementTrčka, Jan January 2020 (has links)
The aim of this work is to increase the accuracy of the transcription of text documents. This work is mainly focused on texts printed on degraded materials such as newspapers or old books. To solve this problem, the current method and problems associated with text recognition are analyzed. Based on the acquired knowledge, the implemented method based on GAN network architecture is chosen. Experiments are a performer on these networks in order to find their appropriate size and their learning parameters. Subsequently, testing is performed to compare different learning methods and compare their results. Both training and testing is a performer on an artificial data set. Using implemented trained networks increases the transcription accuracy from 65.61 % for the raw damaged text lines to 93.23 % for lines processed by this network.
|
Page generated in 0.1266 seconds