Spelling suggestions: "subject:"convolutional beural betworks."" "subject:"convolutional beural conetworks.""
81 |
Design Space Exploration of Convolutional Neural Networks for Image ClassificationShah, Prasham 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Computer vision is a domain which deals with the goal of making technology as efficient as human vision. To achieve that goal, after decades of research, researchers have developed algorithms that are able to work efficiently on resource constrained hardware like mobile or embedded devices for computer vision applications. Due to their constant efforts, such devices have become capable for tasks like Image Classification, Object Detection, Object Recognition, Semantic Segmentation, and many other applications. Autonomous systems like self-driving cars, Drones and UAVs, are being successfully developed because of these advances in AI.
Deep Learning, a part of AI, is a specific domain of Machine Learning which focuses on developing algorithms for such applications. Deep Learning deals with tasks like extracting features from raw image data, replacing pipelines of specialized models with single end-to-end models, making models usable for multiple tasks with superior performance. A major focus is on techniques to detect and extract features which provide better context for inference about an image or video stream. A deep hierarchy of rich features can be learned and automatically extracted from images, provided by the multiple deep layers of CNN models.
CNNs are the backbone of Computer Vision. The reason that CNNs are the focus of attention for deep learning models is that they were specifically designed for image data. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet won the ILSVRC in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net, GoogleNet, ResNet, Inception-v4, Inception-Resnet-v2, ShuffleNet, Xception, MobileNet, MobileNetV2, SqueezeNet, SqueezeNext and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that, the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size.
As a result, today we have CNN models, which are implemented on mobile devices. These mobile models are compact and have low latency, which in turn reduces the computational cost of the embedded system. This thesis resembles similar idea, it proposes two new CNN architectures, A-MnasNet and R-MnasNet, which have been derived from MnasNet by Design Space Exploration. These architectures outperform MnasNet in terms of model size and accuracy. They have been trained and tested on CIFAR-10 dataset. Furthermore, they were implemented on NXP Bluebox 2.0, an autonomous driving platform, for Image Classification.
|
82 |
A Novel Deep Learning Approach for Emotion ClassificationAyyalasomayajula, Satya Chandrashekhar 14 February 2022 (has links)
Neural Networks are at the core of computer vision solutions for various applications. With the advent of deep neural networks Facial Expression Recognition (FER) has been a very ineluctable and challenging task in the field of computer vision. Micro-expressions (ME) have been quite prominently used in security, psychotherapy, neuroscience and have a wide role in several related disciplines. However, due to the subtle movements of facial muscles, the micro-expressions are difficult to detect and identify. Due to the above, emotion detection and classification have always been hot research topics. The recently adopted networks to train FERs are yet to focus on issues caused due to overfitting, effectuated by insufficient data for training and expression unrelated variations like gender bias, face occlusions and others. Association of FER with the Speech Emotion Recognition (SER) triggered the development of multimodal neural networks for emotion classification in which the application of sensors played a significant role as they substantially increased the accuracy by providing high quality inputs, further elevating the efficiency of the system. This thesis relates to the exploration of different principles behind application of deep neural networks with a strong focus towards Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN) in regards to their applications to emotion recognition. A Motion Magnification algorithm for ME's detection and classification was implemented for applications requiring near real-time computations. A new and improved architecture using a Multimodal Network was implemented. In addition to the motion magnification technique for emotion classification and extraction, the Multimodal algorithm takes the audio-visual cues as inputs and reads the MEs on the real face of the participant. This feature of the above architecture can be deployed while administering interviews, or supervising ICU patients in hospitals, in the auto industry, and many others. The real-time emotion classifier based on state-of-the-art Image-Avatar Animation model was tested on simulated subjects. The salient features of the real-face are mapped on avatars that are build with a 3D scene generation platform. In pursuit of the goal of emotion classification, the Image Animation model outperforms all baselines and prior works. Extensive tests and results obtained demonstrate the validity of the approach.
|
83 |
Deep Learning Approach for Vision Navigation in FlightMcNally, Branden Timothy January 2018 (has links)
No description available.
|
84 |
Improved U-Net architecture for Crack Detection in Sand MouldsAhmed, Husain, Bajo, Hozan January 2023 (has links)
The detection of cracks in sand moulds has long been a challenge for both safety and maintenance purposes. Traditional image processing techniques have been employed to identify and quantify these defects but have often proven to be inefficient, labour-intensive, and time-consuming. To address this issue, we sought to develop a more effective approach using deep learning techniques, specifically semantic segmentation. We initially examined three different architectures—U-Net, SegNet, and DeepCrack—to evaluate their performance in crack detection. Through testing and comparison, U-Net emerged as the most suitable choice for our project. To further enhance the model's accuracy, we combined U-Net with VGG-19, VGG-16, and ResNet architectures. However, these combinations did not yield the expected improvements in performance. Consequently, we introduced a new layer to the U-Net architecture, which significantly increased its accuracy and F1 score, making it more efficient for crack detection. Throughout the project, we conducted extensive comparisons between models to better understand the effects of various techniques such as batch normalization and dropout. To evaluate and compare the performance of the different models, we employed the loss function, accuracy, Adam optimizer, and F1 score as evaluation metrics. Some tables and figures explain the differences between models by using image comparison and evaluation metrics comparison; to show which model is better than the other. The conducted evaluations revealed that the U-Net architecture, when enhanced with an extra layer, proved superior to other models, demonstrating the highest scores and accuracy. This architecture has shown itself to be the most effective model for crack detection, thereby laying the foundation for a more cost-efficient and trustworthy approach to detecting and monitoring structural deficiencies.
|
85 |
A systematic study of the class imbalance problem in convolutional neural networksBuda, Mateusz January 2017 (has links)
In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks and compare frequently used methods to address the issue. Class imbalance refers to significantly different number of examples among classes in a training set. It is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. We define and parameterize two representative types of imbalance, i.e. step and linear. Using three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, we investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental and increases with the extent of imbalance and the scale of a task; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that totally eliminates the imbalance, whereas undersampling can perform better when the imbalance is only removed to some extent; (iv) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest; (v) as opposed to some classical machine learning models, oversampling does not necessarily cause overfitting of convolutional neural networks. / I den här studien undersöker vi systematiskt effekten av klassobalans på prestandan för klassificering hos konvolutionsnätverk och jämför vanliga metoder för att åtgärda problemet. Klassobalans avser betydlig ojämvikt hos antalet exempel per klass i ett träningsset. Det är ett vanligt problem som har studerats utförligt inom maskininlärning, men tillgången av systematisk forskning inom djupinlärning är starkt begränsad. Vi definerar och parametriserar två representiva typer av obalans, steg och linjär. Med hjälpav tre dataset med ökande komplexitet, MNIST, CTFAR-10 och ImageNet, undersöker vi effekterna av obalans på klassificering och utför en omfattande jämförelse av flera metoder för att åtgärda problemen: översampling, undersampling, tvåfasträning och avgränsning för tidigare klass-sannolikheter. Vår huvudsakliga utvärderingsmetod är arean under mottagarens karaktäristiska kurva (ROC AUC) justerat för multi-klass-syften, eftersom den övergripande noggrannheten är förenad med anmärkningsvärda svårigheter i samband med obalanserade data. Baserat på experimentens resultat drar vi slutsatserna att (i) effekten av klassens obalans påklassificeringprestanda är skadlig och ökar med mängden obalans och omfattningen av uppgiften; (ii) metoden att ta itu med klassobalans som framträdde som dominant i nästan samtliga analyserade scenarier var översampling; (iii) översampling bör tillämpas till den nivå som helt eliminerar obalansen, medan undersampling kan prestera bättre när obalansen bara avlägsnas i en viss utsträckning; (iv) avgränsning bör tillämpas för att kompensera för tidigare sannolikheter när det totala antalet korrekt klassificerade fall är av intresse; (v) i motsats till hos vissa klassiska maskininlärningsmodeller orsakar översampling inte nödvändigtvis överanpassning av konvolutionsnätverk.
|
86 |
Object Detection Using Feature Extraction and Deep Learning for Advanced Driver Assistance SystemsReza, Tasmia 10 August 2018 (has links)
A comparison of performance between tradition support vector machine (SVM), single kernel, multiple kernel learning (MKL), and modern deep learning (DL) classifiers are observed in this thesis. The goal is to implement different machine-learning classification system for object detection of three dimensional (3D) Light Detection and Ranging (LiDAR) data. The linear SVM, non linear single kernel, and MKL requires hand crafted features for training and testing their algorithm. The DL approach learns the features itself and trains the algorithm. At the end of these studies, an assessment of all the different classification methods are shown.
|
87 |
AUTONOMOUS SAFE LANDING ZONE DETECTION FOR UAVs UTILIZING MACHINE LEARNINGNepal, Upesh 01 May 2022 (has links)
One of the main challenges of the integration of unmanned aerial vehicles (UAVs) into today’s society is the risk of in-flight failures, such as motor failure, occurring in populated areas that can result in catastrophic accidents. We propose a framework to manage the consequences of an in-flight system failure and to bring down the aircraft safely without causing any serious accident to people, property, and the UAV itself. This can be done in three steps: a) Detecting a failure, b) Finding a safe landing spot, and c) Navigating the UAV to the safe landing spot. In this thesis, we will look at part b. Specifically, we are working to develop an active system that can detect landing sites autonomously without any reliance on UAV resources. To detect a safe landing site, we are using a deep learning algorithm named "You Only Look Once" (YOLO) that runs on a Jetson Xavier NX computing module, which is connected to a camera, for image processing. YOLO is trained using the DOTA dataset and we show that it can detect landing spots and obstacles effectively. Then by avoiding the detected objects, we find a safe landing spot. The effectiveness of this algorithm will be shown first by comprehensive simulations. We also plan to experimentally validate this algorithm by flying a UAV and capturing ground images, and then applying the algorithm in real-time to see if it can effectively detect acceptable landing spots.
|
88 |
Real-time hand pose estimation on a smart-phone using Deep LearningGourmet, Valentin January 2019 (has links)
Hand pose estimation is a computer vision challenge that consists of detecting the coordinates of a hand’s key points in an image. This research investigates several deep learning-based solutions to determine whether or not it is possible to improve current state-of-the-art detectors for smartphone applications. Several models are tested and compared based on accuracy, processing speed and memory size. A final network is selected and detailed to compare it to the state-of-the-art. The proposed solution is obtained by combining the Differentiable Spatial to Numerical Transform layer to predict numerical coordinates together with the Fire module presented in the SqueezeNet architecture. This deep neural network contains around 1 million parameters and is able to outperform the current best documented model in all the metrics described above. A qualitative analysis is also performed to examine the predictions of the final solution on test images. / Att bestämma en hands orientering är en utmaning inom bildanalys som består i att detektera koordinaterna för olika nyckelpunkter för handen i en bild. I denna studie undersöks ett antal metoder baserade på djupinlärning för att avgöra huruvida det är möjligt att förbättra existerande detektorer för tillämpningar på smartphones. Flera olika modeller testas och jämförs baserat på noggrannhet, beräkningshastighet och minneskrav. Ett slutligt nätverk väljs, analyseras och jämföras med nuvarande state-of-the-art teknik. Den lösning som föreslås erhålls genom att kombinera ett så kallat Differentiable Spatial to Numerical Transform-lager, för att förutsäga numeriska koordinater, tillsammans med en så kallad Fire-modul som tidigare presenteras som en del av arkitekturen SqueezeNet. Detta djupa neurala nätverk innehåller cirka en miljon parametrar och kan överträffa den nuvarande mest dokumenterade modellen i alla de avseenden som beskrivits ovan. En kvalitativ analys utförs också för att undersöka den slutliga lösningens uppskattningar på testbilder.
|
89 |
Data-driven sparse computational imaging with deep learningMdrafi, Robiulhossain 13 May 2022 (has links) (PDF)
Typically, inverse imaging problems deal with the reconstruction of images from the sensor measurements where sensors can take form of any imaging modality like camera, radar, hyperspectral or medical imaging systems. In an ideal scenario, we can reconstruct the images via applying an inversion procedure from these sensors’ measurements, but practical applications have several challenges: the measurement acquisition process is heavily corrupted by the noise, the forward model is not exactly known, and non-linearities or unknown physics of the data acquisition play roles. Hence, perfect inverse function is not exactly known for immaculate image reconstruction. To this end, in this dissertation, I propose an automatic sensing and reconstruction scheme based on deep learning within the compressive sensing (CS) framework to solve the computational imaging problems. Here, I develop a data-driven approach to learn both the measurement matrix and the inverse reconstruction scheme for a given class of signals, such as images. This approach paves the way for end-to-end learning and reconstruction of signals with the aid of cascaded fully connected and multistage convolutional layers with a weighted loss function in an adversarial learning framework. I also propose to extend our analysis to introduce data driven models to directly classify from compressed measurements through joint reconstruction and classification. I develop constrained measurement learning framework and demonstrate higher performance of the proposed approach in the field of typical image reconstruction and hyperspectral image classification tasks. Finally, I also propose a single data driven network that can take and reconstruct images at multiple rates of signal acquisition. In summary, this dissertation proposes novel methods on the data driven measurement acquisition for sparse signal reconstruction and classification, learning measurements for given constraints underlying the requirement of the hardware for different applications, and producing a common data driven platform for learning measurements to reconstruct signals at multiple rates. This dissertation opens the path to the learned sensing systems. The future research can use these proposed data driven approaches as the pivotal factors to accomplish task-specific smart sensors in several real-world applications.
|
90 |
Breast Abnormality Diagnosis Using Transfer and Ensemble LearningAzour, Farnoosh 02 June 2022 (has links)
Breast cancer is the second fatal disease among cancers both in Canada and across the globe. However, in the case of early detection, it can raise the survival rate. Thus, researchers and scientists have been practicing to develop Computer-Aided Diagnosis (CAD)x systems. Traditional CAD systems depend on manual feature extraction, which has provided radiologists with poor detection and diagnosis tools. However, recently the application of Convolutional Neural Networks (CNN)s as one of the most impressive deep learning-based methods and one of its interesting techniques, Transfer Learning, has revolutionized the performance and development of these systems.
In medical diagnosis, one issue is distinguishing between breast mass lesions and calcifications (little deposits of calcium). This work offers a solution using transfer learning and ensemble learning (majority voting) at the first stage and later replacing the voting strategy with soft voting. Also, regardless of the abnormality's type (mass or calcification), the severeness of the abnormality plays a key role.
Nevertheless, in this study, we went further and made an effort to create a (CAD)x pathology diagnosis system. More specifically, after comparing multi-classification results with a two-staged abnormality diagnosis system, we propose the two-staged binary classifier as our final model.
Thus, we offer a novel breast cancer diagnosis system using a wide range of pre-trained models in this study. To the best of our knowledge, we are the first who integrate the application of a wide range of state-of-the-art pre-trained models, particularly including EfficientNet for the transfer learning part, and subsequently, employ ensemble learning.
With the application of pre-trained CNN-based models or transfer learning, we are able to overcome the lack of large-size datasets. Moreover, with the EfficientNet family offering better results with fewer parameters, we achieved promising results in terms of accuracy and AUC-score, and later ensemble learning was applied to provide robustness for the network. After performing 10-fold cross-validation, our experiments yielded promising results; while constructing the breast abnormality classifier 0.96 ± 0.03 and 0.96 for accuracy and AUC-score, respectively.
Similarly, it resulted in 0.85 ± 0.08 for accuracy and 0.81 for AUC-score when constructing pathology diagnosis.
|
Page generated in 0.1043 seconds