Spelling suggestions: "subject:"deep convolutional beural betworks"" "subject:"deep convolutional beural conetworks""
1 |
Self-supervised monocular image depth learning and confidence estimationChen, L., Tang, W., Wan, Tao Ruan, John, N.W. 17 June 2020 (has links)
No / We present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
|
2 |
Deep Learning Approach to Trespass Detection using Video Surveillance DataBashir, Muzammil 22 April 2019 (has links)
While railroad trespassing is a dangerous activity with significant security and safety risks, regular patrolling of potential trespassing sites is infeasible due to exceedingly high resource demands and personnel costs. There is thus a need to design an automated trespass detection and early warning prediction tool leveraging state-of-the-art machine learning techniques. Leveraging video surveillance through security cameras, this thesis designs a novel approach called ARTS (Automated Railway Trespassing detection System) that tackles the problem of detecting trespassing activity. In particular, we adopt a CNN-based deep learning architecture (Faster-RCNN) as the core component of our solution. However, these deep learning-based methods, while effective, are known to be computationally expensive and time consuming, especially when applied to a large amount of surveillance data. Given the sparsity of railroad trespassing activity, we design a dual-stage deep learning architecture composed of an inexpensive prefiltering stage for activity detection followed by a high fidelity trespass detection stage for robust classification. The former is responsible for filtering out frames that show little to no activity, this way reducing the amount of data to be processed by the later more compute-intensive stage which adopts state-of-the-art Faster-RCNN to ensure effective classification of trespassing activity. The resulting dual-stage architecture ARTS represents a flexible solution capable of trading-off performance and computational time. We demonstrate the efficacy of our approach on a public domain surveillance dataset.
|
3 |
Automatic Eye-Gaze Following from 2-D Static Images: Application to Classroom Observation Video AnalysisAung, Arkar Min 23 April 2018 (has links)
In this work, we develop an end-to-end neural network-based computer vision system to automatically identify where each person within a 2-D image of a school classroom is looking (“gaze following�), as well as who she/he is looking at. Automatic gaze following could help facilitate data-mining of large datasets of classroom observation videos that are collected routinely in schools around the world in order to understand social interactions between teachers and students. Our network is based on the architecture by Recasens, et al. (2015) but is extended to (1) predict not only where, but who the person is looking at; and (2) predict whether each person is looking at a target inside or outside the image. Since our focus is on classroom observation videos, we collect gaze dataset (48,907 gaze annotations over 2,263 classroom images) for students and teachers in classrooms. Results of our experiments indicate that the proposed neural network can estimate the gaze target - either the spatial location or the face of a person - with substantially higher accuracy compared to several baselines.
|
4 |
Exploration and Comparison of Image-Based Techniques for Strawberry DetectionLiu, Yongxin 01 September 2020 (has links) (PDF)
Strawberry is an important cash crop in California, and its supply accounts for 80% of the US market [2]. However, in current practice, strawberries are picked manually, which is very labor-intensive and time-consuming. In addition, the farmers need to hire an appropriate number of laborers to harvest the berries based on the estimated volume. When overestimating the yield, it will cause a waste of human resources, while underestimating the yield will cause the loss of the strawberry harvest [3]. Therefore, accurately estimating harvest volume in the field is important to farmers. This paper focuses on an image-based solution to detect strawberries in the field by using the traditional computer vision technique and deep learning method.
When strawberries are in different growth stages, there are considerable differences in their color. Therefore, various color spaces are first studied in this work, and the most effective color components are used in detecting strawberries and differentiating mature and immature strawberries.
In some color channels such as the R color channel from the RGB color model, Hue color channel from the HSV color model, 'a' color channel from the Lab color model, the pixels belonging to ripe strawberries are clearly distinguished from the background pixels. Thus, the color-based K-mean cluster algorithm to detect red strawberries will be exploited. Finally, it achieves a 90.5% truth-positive rate for detecting red strawberries. For detecting the unripe strawberry, this thesis first trained the Support Vector Machine classifier based on the HOG feature. After optimizing the classifier through hard negative mining, the truth-positive rate reached 81.11%.
Finally, when exploring the deep learning model, two detectors based on different pre-trained models were trained using TensorFlow Object Detection API with the acceleration of Amazon Web Services' GPU instance. When detecting in a single strawberry plant image, they have achieved truth-positive rates of 89.2% and 92.3%, respectively; while in the strawberry field image with multiple plants, they have reached 85.5% and 86.3%.
|
5 |
Towards Explainable Decision-making Strategies of Deep Convolutional Neural Networks : An exploration into explainable AI and potential applications within cancer detectionHammarström, Tobias January 2020 (has links)
The influence of Artificial Intelligence (AI) on society is increasing, with applications in highly sensitive and complicated areas. Examples include using Deep Convolutional Neural Networks within healthcare for diagnosing cancer. However, the inner workings of such models are often unknown, limiting the much-needed trust in the models. To combat this, Explainable AI (XAI) methods aim to provide explanations of the models' decision-making. Two such methods, Spectral Relevance Analysis (SpRAy) and Testing with Concept Activation Methods (TCAV), were evaluated on a deep learning model classifying cat and dog images that contained introduced artificial noise. The task was to assess the methods' capabilities to explain the importance of the introduced noise for the learnt model. The task was constructed as an exploratory step, with the future aim of using the methods on models diagnosing oral cancer. In addition to using the TCAV method as introduced by its authors, this study also utilizes the CAV-sensitivity to introduce and perform a sensitivity magnitude analysis. Both methods proved useful in discerning between the model’s two decision-making strategies based on either the animal or the noise. However, greater insight into the intricacies of said strategies is desired. Additionally, the methods provided a deeper understanding of the model’s learning, as the model did not seem to properly distinguish between the noise and the animal conceptually. The methods thus accentuated the limitations of the model, thereby increasing our trust in its abilities. In conclusion, the methods show promise regarding the task of detecting visually distinctive noise in images, which could extend to other distinctive features present in more complex problems. Consequently, more research should be conducted on applying these methods on more complex areas with specialized models and tasks, e.g. oral cancer.
|
6 |
Hluboké neuronové sítě: implementace pro vestavěné systémy / Deep Neural Networks: Embedded System ImplementationMatěj, Aleš January 2018 (has links)
The goal of this thesis is to firstly design and implement an application for embeddedsystems which will classify MNIST numbers and secondly optimize energy and memoryrequirements of this network. The basics of neural networks, Cortex-M processor cores andembedded devices are described in the theoretical part. Followed by implementation details.Networks learning is done with Python and Theano library on a PC. The network is thenconverted to C for a board STM32F429 Discovery. Last part consist of network optimization,which focuses on convolution, dot product and number representation of weights and biasesof the network.
|
7 |
INTELLIGENT SOLID WASTE CLASSIFICATION SYSTEM USING DEEP LEARNINGMichel K Mudemfu (13558270) 31 July 2023 (has links)
<p> </p>
<p>The proper classification and disposal of waste are crucial in reducing environmental impacts and promoting sustainability. Several solid waste classification systems have been developed over the years, ranging from manual sorting to mechanical and automated sorting. Manual sorting is the oldest and most commonly used method, but it is time-consuming and labor-intensive. Mechanical sorting is a more efficient and cost-effective method, but it is not always accurate, and it requires constant maintenance. Automated sorting systems use different types of sensors and algorithms to classify waste, making them more accurate and efficient than manual and mechanical sorting systems. In this thesis, we propose the development of an intelligent solid waste detection, classification and tracking system using artificial deep learning techniques. To address the limited samples in the TrashNetV2 dataset and enhance model performance, a data augmentation process was implemented. This process aimed to prevent overfitting and mitigate data scarcity issues while improving the model's robustness. Various augmentation techniques were employed, including random rotation within a range of -20° to 20° to account for different orientations of the recycled materials. A random blur effect of up to 1.5 pixels was used to simulate slight variations in image quality that can arise during image acquisition. Horizontal and vertical flipping of images were applied randomly to accommodate potential variations in the appearance of recycled materials based on their orientation within the image. Additionally, the images were randomly scaled to 416 by 416 pixels, maintaining a consistent image size while increasing the dataset's overall size. Further variability was introduced through random cropping, with a minimum zoom level of 0% and a maximum zoom level of 25%. Lastly, hue variations within a range of -20° to 20° were randomly introduced to replicate lighting condition variations that may occur during image acquisition. These augmentation techniques collectively aimed to improve the dataset's diversity and the model's performance. In this study, YOLOv8, EfficientNet-B0 and VGG16 architectures were evaluated, and stochastic gradient descent (SGD) and Adam were used as the optimizer. Although, SGD provided better test accuracies compared to Adam. </p>
<p>Among the three models, YOLOv8 showed the best performance, with the highest average precision mAP of 96.5%. YOLOv8 emerges as the top performer, with ROC values varying from 92.70% (Metal) to 98.40% (Cardboard). Therefore, the YOLOv8 model outperforms both VGG16 and EfficientNet in terms of ROC values and mAP. The findings demonstrate that our novel classifier tracker system made of YOLOv8, and supervision algorithms surpass conventional deep learning methods in terms of precision, resilience, and generalization ability. Our contribution to waste management is in the development and implementation of an intelligent solid waste detection, classification, and tracking system using computer vision and deep learning techniques. By utilizing computer vision and deep learning algorithms, our system can accurately detect, classify, and localize various types of solid waste on a moving conveyor, including cardboard, glass, metal, paper, and plastic. This can significantly improve the efficiency and accuracy of waste sorting processes.</p>
<p>This research provides a promising solution for detection, classification, localization, and tracking of solid waste materials in real time system, which can be further integrated into existing waste management systems. Through comprehensive experimentation and analysis, we demonstrate the superiority of our approach over traditional methods, with higher accuracy and faster processing times. Our findings provide a compelling case for the implementation of intelligent solid waste sorting.</p>
|
8 |
OBJECT DETECTION USING VISION TRANSFORMED EFFICIENTDETShreyanil Kar (16285265) 30 August 2023 (has links)
<p>This research presents a novel approach for object detection by integrating Vision Transformers (ViT) into the EfficientDet architecture. The field of computer vision, encompassing artificial intelligence, focuses on the interpretation and analysis of visual data. Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have significantly improved the accuracy and efficiency of computer vision systems. Object detection, a widely studied application within computer vision, involves the identification and localization of objects in images.</p>
<p>The ViT backbone, renowned for its success in image classification and natural language processing tasks, employs self-attention mechanisms to capture global dependencies in input images. However, ViT’s capability to capture fine-grained details and context information is limited. To address this limitation, the integration of ViT into the EfficientDet architecture is proposed. EfficientDet is recognized for its efficiency and accuracy in object detection. By combining the strengths of ViT and EfficientDet, the proposed integration enhances the network’s ability to capture fine-grained details and context information. It leverages ViT’s global dependency modeling alongside EfficientDet’s efficient object detection framework, resulting in highly accurate and efficient performance. Noteworthy object detection frameworks utilized in the industry, such as RetinaNet, EfficientNet, and EfficientDet, primarily employ convolution.</p>
<p>Experimental evaluations were conducted using the PASCAL VOC 2007 and 2012 datasets, widely acknowledged benchmarks for object detection. The integrated ViT-EfficientDet model achieved an impressive mean Average Precision (mAP) score of 86.27% when tested on the PASCAL VOC 2007 dataset, demonstrating its superior accuracy. These results underscore the potential of the proposed integration for real-world applications.</p>
<p>In conclusion, the research introduces a novel integration of Vision Transformers into the EfficientDet architecture, yielding significant improvements in object detection performance. By combining ViT’s ability to capture global dependencies with EfficientDet’s efficiency and accuracy, the proposed approach offers enhanced object detection capabilities. Future research directions may explore additional datasets and evaluate the performance of the proposed framework across various computer vision tasks.</p>
|
9 |
A DEEP LEARNING BASED FRAMEWORK FOR NOVELTY AWARE EXPLAINABLE MULTIMODAL EMOTION RECOGNITION WITH SITUATIONAL KNOWLEDGEMijanur Palash (16672533) 03 August 2023 (has links)
<p>Mental health significantly impacts issues like gun violence, school shootings, and suicide. There is a strong connection between mental health and emotional states. By monitoring emotional changes over time, we can identify triggering events, detect early signs of instability, and take preventive measures. This thesis focuses on the development of a generalized and modular system for human emotion recognition and explanation based on visual information. The aim is to address the challenges of effectively utilizing different cues (modalities) available in the data for a reliable and trustworthy emotion recognition system. Our face is one of the most important medium through which we can express our emotion. Therefore We first propose SAFER, A novel facial emotion recognition system with background and place features. We provide a detailed evaluation framework to prove the high accuracy and generalizability. However, relying solely on facial expressions for emotion recognition can be unreliable, as faces can be covered or deceptive. To enhance the system's reliability, we introduce EMERSK, a multimodal emotion recognition system that integrates various modalities, including facial expressions, posture, gait, and scene background, in a flexible and modular manner. It employs convolutional neural networks (CNNs), Long Short-term Memory (LSTM), and denoising auto-encoders to extract features from facial images, posture, gait, and scene background. In addition to multimodal feature fusion, the system utilizes situational knowledge derived from place type and adjective-noun pairs (ANP) extracted from the scene, as well as the spatio-temporal average distribution of emotions, to generate comprehensive explanations for the recognition outcomes. Extensive experiments on different benchmark datasets demonstrate the superiority of our approach over existing state-of-the-art methods. The system achieves improved performance in accurately recognizing and explaining human emotions. Moreover, we investigate the impact of novelty, such as face masks during the Covid-19 pandemic, on the emotion recognition. The study critically examines the limitations of mainstream facial expression datasets and proposes a novel dataset specifically tailored for facial emotion recognition with masked subjects. Additionally, we propose a continuous learning-based approach that incorporates a novelty detector working in parallel with the classifier to detect and properly handle instances of novelty. This approach ensures robustness and adaptability in the automatic emotion recognition task, even in the presence of novel factors such as face masks. This thesis contributes to the field of automatic emotion recognition by providing a generalized and modular approach that effectively combines multiple modalities, ensuring reliable and highly accurate recognition. Moreover, it generates situational knowledge that is valuable for mission-critical applications and provides comprehensive explanations of the output. The findings and insights from this research have the potential to enhance the understanding and utilization of multimodal emotion recognition systems in various real-world applications.</p>
<p><br></p>
|
10 |
Algorithm And Architecture Design for Real-time Face RecognitionMahale, Gopinath Vasanth January 2016 (has links) (PDF)
Face recognition is a field of biometrics that deals with identification of subjects based on features present in the images of their faces. The factors that make face recognition popular and favorite as compared to other biometric methods are easier operation and ability to identify subjects without their knowledge. With these features, face recognition has become an integral part of the present day security systems, targeting a smart and secure world.
There are various factors that de ne the performance of a face recognition system. The most important among them are recognition accuracy of algorithm used and time taken for recognition. Recognition accuracy of the face recognition algorithm gets affected by changes in pose, facial expression and illumination along with occlusions in the images. There have been a number of algorithms proposed to enable recognition under these ambient changes. However, it has been hard to and a single algorithm that can efficiently recognize faces in all the above mentioned conditions. Moreover, achieving real time performance for most of the complex face recognition algorithms on embedded platforms has been a challenge. Real-time performance is highly preferred in critical applications such as identification of crime suspects in public. As available software solutions for FR have significantly large latency in recognizing individuals, they are not suitable for such critical real-time applications. This thesis focuses on real-time aspect of FR, where acceleration of the algorithms is achieved by means of parallel hardware architectures.
The major contributions of this work are as follows. We target to design a face recognition system that can identify at most 30 faces in each frame of video at 15 frames per second, which amounts to 450 recognitions per second. In addition, we target to achieve good recognition accuracy along with scalability in terms of database size and input image resolutions. To design a system with these specifications, as a first step, we explore algorithms in literature and come up with a hybrid face recognition algorithm. This hybrid algorithm shows good recognition accuracy on face images with changes in illumination, pose and expressions, and also with occlusions. In addition the computations in the algorithm are modular in nature which are suitable for real-time realizations through parallel processing.
The face recognition system consists of a face detection module to detect faces in the input image, which is followed by a face recognition module to identify the detected faces. There are well established algorithms and architectures for face detection in literature which can perform detection at 15 frames per second on video frames. Detected faces of different sizes need to be scaled to the size specified by the face recognition module. To meet the real-time constraints, we propose a hardware architecture for real-time bi-cubic convolution interpolation with dynamic scaling factors. To recognize the resized faces in real-time, a scalable parallel pipelined architecture is designed for the hybrid algorithm which can perform 450 recognitions per second on a database containing grayscale images of at most 450 classes on Virtex 6 FPGA. To provide flexibility and programmability, we extend this design to REDEFINE, a multi-core massively parallel reconfigurable architecture. In this design, we come up with FR specific programmable cores termed Scalable Unit for Region Evaluation (SURE) capable of performing modular computations in the hybrid face recognition algorithm. We replicate SUREs in each tile of REDEFINE to construct a face recognition module termed REDEFINE for Face Recognition using SURE Homogeneous Cores (REFRESH).
There is a need to learn new unseen faces on-line in practical face recognition systems. Considering this, for real-time on-line learning of unseen face images, we design tiny processors termed VOP, Processor for Vector Operations. VOPs function as coprocessors to process elements under each tile of REDEFINE to accelerate micro vector operations appearing in the synaptic weight computations. We also explore deep neural networks which operate similar to the processing in human brain and capable of working on very large face databases. We explore the field of Random matrix theory to come up with a solution for synaptic weight initialization in deep neural networks for better classification . In addition, we perform design space exploration of hardware architecture for deep convolution networks and conclude with directions for future work.
|
Page generated in 0.0918 seconds