• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 4
  • 1
  • Tagged with
  • 92
  • 92
  • 92
  • 29
  • 19
  • 19
  • 17
  • 15
  • 14
  • 13
  • 12
  • 11
  • 10
  • 10
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Deep Learning Models for Context-Aware Object Detection

Arefiyan Khalilabad, Seyyed Mostafa 15 September 2017 (has links)
In this thesis, we present ContextNet, a novel general object detection framework for incorporating context cues into a detection pipeline. Current deep learning methods for object detection exploit state-of-the-art image recognition networks for classifying the given region-of-interest (ROI) to predefined classes and regressing a bounding-box around it without using any information about the corresponding scene. ContextNet is based on an intuitive idea of having cues about the general scene (e.g., kitchen and library), and changes the priors about presence/absence of some object classes. We provide a general means for integrating this notion in the decision process about the given ROI by using a pretrained network on the scene recognition datasets in parallel to a pretrained network for extracting object-level features for the corresponding ROI. Using comprehensive experiments on the PASCAL VOC 2007, we demonstrate the effectiveness of our design choices, the resulting system outperforms the baseline in most object classes, and reaches 57.5 mAP (mean Average Precision) on the PASCAL VOC 2007 test set in comparison with 55.6 mAP for the baseline. / MS
82

Detekce cizích objektů v rentgenových snímcích hrudníku s využitím metod strojového učení / Detection of foreign objects in X-ray chest images using machine learning methods

Matoušková, Barbora January 2021 (has links)
Foreign objects in Chest X-ray (CXR) cause complications during automatic image processing. To prevent errors caused by these foreign objects, it is necessary to automatically find them and ommit them in the analysis. These are mainly buttons, jewellery, implants, wires and tubes. At the same time, finding pacemakers and other placed devices can help with automatic processing. The aim of this work was to design a method for the detection of foreign objects in CXR. For this task, Faster R-CNN method with a pre-trained ResNet50 network for feature extraction was chosen which was trained on 4 000 images and lately tested on 1 000 images from a publicly available database. After finding the optimal learning parameters, it was managed to train the network, which achieves 75% precision, 77% recall and 76% F1 score. However, a certain part of the error is formed by non-uniform annotations of objects in the data because not all annotated foreign objects are located in the lung area, as stated in the description.
83

The role of model implementation in neuroscientific applications of machine learning

Abe, Taiga January 2024 (has links)
In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation. Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings. First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS. Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context.
84

COVID-19 Diagnosis and Segmentation Using Machine Learning Analyses of Lung Computerized Tomography

Mittal, Bhuvan 08 1900 (has links)
COVID-19 is a highly contagious and virulent disease caused by the severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2). COVID-19 disease induces lung changes observed in lung computerized tomography (CT) and the percentage of those diseased areas on the CT correlates with the severity of the disease. Therefore, segmentation of CT images to delineate the diseased or lesioned areas is a logical first step to quantify disease severity, which will help physicians predict disease prognosis and guide early treatments to deliver more positive patient outcomes. It is crucial to develop an automated analysis of CT images to save their time and efforts. This dissertation proposes CoviNet, a deep three-dimensional convolutional neural network (3D-CNN) to diagnose COVID-19 in CT images. It also proposes CoviNet Enhanced, a hybrid approach with 3D-CNN and support vector machines. It also proposes CoviSegNet and CoviSegNet Enhanced, which are enhanced U-Net models to segment ground-glass opacities and consolidations observed in computerized tomography (CT) images of COVID-19 patients. We trained and tested the proposed approaches using several public datasets of CT images. The experimental results show the proposed methods are highly effective for COVID-19 detection and segmentation and exhibit better accuracy, precision, sensitivity, specificity, F-1 score, Matthew's correlation coefficient (MCC), dice score, and Jaccard index in comparison with recently published studies.
85

Action Recognition with Knowledge Transfer

Choi, Jin-Woo 07 January 2021 (has links)
Recent progress on deep neural networks has shown remarkable action recognition performance from videos. The remarkable performance is often achieved by transfer learning: training a model on a large-scale labeled dataset (source) and then fine-tuning the model on the small-scale labeled datasets (targets). However, existing action recognition models do not always generalize well on new tasks or datasets because of the following two reasons. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor generalization performance. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small- scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. For the first problem, I propose to learn scene-invariant action representations to mitigate the scene bias in action recognition models. Specifically, I augment the standard cross-entropy loss for action classification with 1) an adversarial loss for the scene types and 2) a human mask confusion loss for videos where the human actors are invisible. These two losses encourage learning representations unsuitable for predicting 1) the correct scene types and 2) the correct action types when there is no evidence. I validate the efficacy of the proposed method by transfer learning experiments. I trans- fer the pre-trained model to three different tasks, including action classification, temporal action localization, and spatio-temporal action detection. The results show consistent improvement over the baselines for every task and dataset. I formulate human action recognition as an unsupervised domain adaptation (UDA) problem to handle the second problem. In the UDA setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already exist- ing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene, to learn domain-invariant action representations. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Then I explore the semi-supervised video action recognition, where we have a lot of labeled videos as source data and sparsely labeled videos as target data. The semi-supervised setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject photometric, geometric, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks. / Doctor of Philosophy / Recent progress on deep learning has shown remarkable action recognition performance. The remarkable performance is often achieved by transferring the knowledge learned from existing large-scale data to the small-scale data specific to applications. However, existing action recog- nition models do not always work well on new tasks and datasets because of the following two problems. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor performance on the new datasets and tasks. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small-scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. To tackle the first problem, I propose to learn scene-invariant action representations to mitigate background scene- biased human action recognition models for the first problem. Specifically, the proposed method learns representations that cannot predict the scene types and the correct actions when there is no evidence. I validate the proposed method's effectiveness by transferring the pre-trained model to multiple action understanding tasks. The results show consistent improvement over the baselines for every task and dataset. To handle the second problem, I formulate human action recognition as an unsupervised learning problem on the target data. In this setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already existing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Here, we have many labeled videos as source data and sparsely labeled videos as target data. The setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject color, spatial, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks.
86

Multi-layer Optimization Aspects of Deep Learning and MIMO-based Communication Systems

Erpek, Tugba 20 September 2019 (has links)
This dissertation addresses multi-layer optimization aspects of multiple input multiple output (MIMO) and deep learning-based communication systems. The initial focus is on the rate optimization for multi-user MIMO (MU-MIMO) configurations; specifically, multiple access channel (MAC) and interference channel (IC). First, the ergodic sum rates of MIMO MAC and IC configurations are determined by jointly integrating the error and overhead effects due to channel estimation (training) and feedback into the rate optimization. Then, we investigated methods that will increase the achievable rate for parallel Gaussian IC (PGIC) which is a special case of MIMO IC where there is no interference between multiple antenna elements. We derive a generalized iterative waterfilling algorithm for power allocation that maximizes the ergodic achievable rate. We verified the sum rate improvement with our proposed scheme through extensive simulation tests. Next, we introduce a novel physical layer scheme for single user MIMO spatial multiplexing systems based on unsupervised deep learning using an autoencoder. Both transmitter and receiver are designed as feedforward neural networks (FNN) and constellation diagrams are optimized to minimize the symbol error rate (SER) based on the channel characteristics. We first evaluate the SER in the presence of a constant Rayleigh-fading channel as a performance upper bound. Then, we quantize the Gaussian distribution and train the autoencoder with multiple quantized channel matrices. The channel is provided as an input to both the transmitter and the receiver. The performance exceeds that of conventional communication systems both when the autoencoder is trained and tested with single and multiple channels and the performance gain is sustained after accounting for the channel estimation error. Moreover, we evaluate the performance with increasing number of quantization points and when there is a difference between training and test channels. We show that the performance loss is minimal when training is performed with sufficiently large number of quantization points and number of channels. Finally, we develop a distributed and decentralized MU-MIMO link selection and activation protocol that enables MU-MIMO operation in wireless networks. We verified the performance gains with the proposed protocol in terms of average network throughput. / Doctor of Philosophy / Multiple Input Multiple Output (MIMO) wireless systems include multiple antennas both at the transmitter and receiver and they are widely used today in cellular and wireless local area network systems to increase robustness, reliability and data rate. Multi-user MIMO (MU-MIMO) configurations include multiple access channel (MAC) where multiple transmitters communicate simultaneously with a single receiver; interference channel (IC) where multiple transmitters communicate simultaneously with their intended receivers; and broadcast channel (BC) where a single transmitter communicates simultaneously with multiple receivers. Channel state information (CSI) is required at the transmitter to precode the signal and mitigate interference effects. This requires CSI to be estimated at the receiver and transmitted back to the transmitter in a feedback loop. Errors occur during both channel estimation and feedback processes. We initially analyze the achievable rate of MAC and IC configurations when both channel estimation and feedback errors are taken into account in the capacity formulations. We treat the errors associated with channel estimation and feedback as additional noise. Next, we develop methods to maximize the achievable rate for IC by using interference cancellation techniques at the receivers when the interference is very strong. We consider parallel Gaussian IC (PGIC) which is a special case of MIMO IC where there is no interference between multiple antenna elements. We develop a power allocation scheme which maximizes the ergodic achievable rate of the communication systems. We verify the performance improvement with our proposed scheme through simulation tests. Standard optimization techniques are used to determine the fundamental limits of MIMO communications systems. However, there is still a gap between current operational systems and these limits due to complexity of these solutions and limitations in their assumptions. Next, we introduce a novel physical layer scheme for MIMO systems based on machine learning; specifically, unsupervised deep learning using an autoencoder. An autoencoder consists of an encoder and a decoder that compresses and decompresses data, respectively. We designed both the encoder and the decoder as feedforward neural networks (FNNs). In our case, encoder performs transmitter functionalities such as modulation and error correction coding and decoder performs receiver functionalities such as demodulation and decoding as part of the communication system. Channel is included as an additional layer between the encoder and decoder. By incorporating the channel effects in the design process of the autoencoder and jointly optimizing the transmitter and receiver, we demonstrate the performance gains over conventional MIMO communication schemes. Finally, we develop a distributed and decentralized MU-MIMO link selection and activation protocol that enables MU-MIMO operation in wireless networks. We verified the performance gains with the proposed protocol in terms of average network throughput.
87

Predicting the Effects of Sedative Infusion on Acute Traumatic Brain Injury Patients

McCullen, Jeffrey Reynolds 09 April 2020 (has links)
Healthcare analytics has traditionally relied upon linear and logistic regression models to address clinical research questions mostly because they produce highly interpretable results [1, 2]. These results contain valuable statistics such as p-values, coefficients, and odds ratios that provide healthcare professionals with knowledge about the significance of each covariate and exposure for predicting the outcome of interest [1]. Thus, they are often favored over new deep learning models that are generally more accurate but less interpretable and scalable. However, the statistical power of linear and logistic regression is contingent upon satisfying modeling assumptions, which usually requires altering or transforming the data, thereby hindering interpretability. Thus, generalized additive models are useful for overcoming this limitation while still preserving interpretability and accuracy. The major research question in this work involves investigating whether particular sedative agents (fentanyl, propofol, versed, ativan, and precedex) are associated with different discharge dispositions for patients with acute traumatic brain injury (TBI). To address this, we compare the effectiveness of various models (traditional linear regression (LR), generalized additive models (GAMs), and deep learning) in providing guidance for sedative choice. We evaluated the performance of each model using metrics for accuracy, interpretability, scalability, and generalizability. Our results show that the new deep learning models were the most accurate while the traditional LR and GAM models maintained better interpretability and scalability. The GAMs provided enhanced interpretability through pairwise interaction heat maps and generalized well to other domains and class distributions since they do not require satisfying the modeling assumptions used in LR. By evaluating the model results, we found that versed was associated with better discharge dispositions while ativan was associated with worse discharge dispositions. We also identified other significant covariates including age, the Northeast region, the Acute Physiology and Chronic Health Evaluation (APACHE) score, Glasgow Coma Scale (GCS), and ethanol level. The versatility of versed may account for its association with better discharge dispositions while ativan may have negative effects when used to facilitate intubation. Additionally, most of the significant covariates pertain to the clinical state of the patient (APACHE, GCS, etc.) whereas most non-significant covariates were demographic (gender, ethnicity, etc.). Though we found that deep learning slightly improved over LR and generalized additive models after fine-tuning the hyperparameters, the deep learning results were less interpretable and therefore not ideal for making the aforementioned clinical insights. However deep learning may be preferable in cases with greater complexity and more data, particularly in situations where interpretability is not as critical. Further research is necessary to validate our findings, investigate alternative modeling approaches, and examine other outcomes and exposures of interest. / Master of Science / Patients with Traumatic Brain Injury (TBI) often require sedative agents to facilitate intubation and prevent further brain injury by reducing anxiety and decreasing level of consciousness. It is important for clinicians to choose the sedative that is most conducive to optimizing patient outcomes. Hence, the purpose of our research is to provide guidance to aid this decision. Additionally, we compare different modeling approaches to provide insights into their relative strengths and weaknesses. To achieve this goal, we investigated whether the exposure of particular sedatives (fentanyl, propofol, versed, ativan, and precedex) was associated with different hospital discharge locations for patients with TBI. From best to worst, these discharge locations are home, rehabilitation, nursing home, remains hospitalized, and death. Our results show that versed was associated with better discharge locations and ativan was associated with worse discharge locations. The fact that versed is often used for alternative purposes may account for its association with better discharge locations. Further research is necessary to further investigate this and the possible negative effects of using ativan to facilitate intubation. We also found that other variables that influence discharge disposition are age, the Northeast region, and other variables pertaining to the clinical state of the patient (severity of illness metrics, etc.). By comparing the different modeling approaches, we found that the new deep learning methods were difficult to interpret but provided a slight improvement in performance after optimization. Traditional methods such as linear regression allowed us to interpret the model output and make the aforementioned clinical insights. However, generalized additive models (GAMs) are often more practical because they can better accommodate other class distributions and domains.
88

Application of Deep Learning in Intelligent Transportation Systems

Dabiri, Sina 01 February 2019 (has links)
The rapid growth of population and the permanent increase in the number of vehicles engender several issues in transportation systems, which in turn call for an intelligent and cost-effective approach to resolve the problems in an efficient manner. A cost-effective approach for improving and optimizing transportation-related problems is to unlock hidden knowledge in ever-increasing spatiotemporal and crowdsourced information collected from various sources such as mobile phone sensors (e.g., GPS sensors) and social media networks (e.g., Twitter). Data mining and machine learning techniques are the major tools for analyzing the collected data and extracting useful knowledge on traffic conditions and mobility behaviors. Deep learning is an advanced branch of machine learning that has enjoyed a lot of success in computer vision and natural language processing fields in recent years. However, deep learning techniques have been applied to only a small number of transportation applications such as traffic flow and speed prediction. Accordingly, my main objective in this dissertation is to develop state-of-the-art deep learning architectures for resolving the transport-related applications that have not been treated by deep learning architectures in much detail, including (1) travel mode detection, (2) vehicle classification, and (3) traffic information system. To this end, an efficient representation for spatiotemporal and crowdsourced data (e.g., GPS trajectories) is also required to be designed in such a way that not only be adaptable with deep learning architectures but also contains efficient information for solving the task-at-hand. Furthermore, since the good performance of a deep learning algorithm is primarily contingent on access to a large volume of training samples, efficient data collection and labeling strategies are developed for different data types and applications. Finally, the performance of the proposed representations and models are evaluated by comparing to several state-of-the-art techniques in literature. The experimental results clearly and consistently demonstrate the superiority of the proposed deep-learning based framework for each application. / PHD / The rapid growth of population and the permanent increase in the number of vehicles engender several issues in transportation systems, which in turn call for an intelligent and cost-effective approach to resolve the problems in an efficient manner. Furthermore, the recent advances in positioning tools (e.g., GPS sensors) and ever-popularity of social media networks have enabled generation of massive spatiotemporal and crowdsourced data. This dissertation aims to leverage the advances in artificial intelligence so as to unlock the rick knowledge in the recorded data and in turn, optimizing the transportation systems in a cost-effective way. In particular, this dissertation seeks for proposing end-to-end frameworks based on deep learning models, as an advanced branch of artificial intelligence, as well as spatiotemporal and crowdsourced datasets (e.g., GPS trajectory and social media) for improving three transportation problems. (1) Travel Mode Detection, which is defined as identifying users’ transportation mode(s) (e.g., walk, bike, bus, car, and train) when traveling around the traffic network. (2) Vehicle Classification, which is defined as identifying the vehicle’s type (e.g., passenger car and truck) while moving in a traffic network. (3) traffic information system based on social media networks, which is defined as detecting traffic events (e.g., crash) and capturing traffic information (e.g., traffic congestion) on a real-time basis from users’ tweets. The experimental results clearly and consistently demonstrate the superiority of the proposed deep-learning based framework for each application.
89

Precise Identification of Neurological Disorders using Deep Learning and Multimodal Clinical Neuroimaging

Park, David Keetae January 2024 (has links)
Neurological disorders present a significant challenge in global health. With the increasing availability of imaging datasets and the development of precise machine learning models, early and accurate diagnosis of neurological conditions is a promising and active area of research. However, several characteristic factors in neurology domains, such as heterogeneous imaging, inaccurate labels, or limited data, act as bottlenecks in using deep learning on clinical neuroimaging. Given these circumstances, this dissertation attempts to provide a guideline, proposing several methods and showcasing successful implementations in broad neurological conditions, including epilepsy and neurodegeneration. Methodologically, a particular focus is on comparing a two-dimensional approach as opposed to three-dimensional neural networks. In most clinical domains of neurological disorders, data are scarce and signals are weak, discouraging the use of 3D representation of raw scan data. This dissertation first demonstrates competitive performances with 2D models in tuber segmentation and AD comorbidity detection. Second, the potentials of ensemble learning are explored, further justifying the use of 2D models in the identification of neurodegeneration. Lastly, CleanNeuro is introduced in the context of 2D classification, a novel algorithm for denoising the datasets prior to training. CleanNeuro, on top of 2D classification and ensemble learning, demonstrates the feasibility of accurately classifying patients with comorbid AD and cerebral amyloid angiopathy from AD controls. Methods presented in this dissertation may serve as exemplars in the study of neurological disorders using deep learning and clinical neuroimaging. Clinically, this dissertation contributes to improving automated diagnosis and identification of regional vulnerabilities of several neurological disorders on clinical neuroimaging using deep learning. First, the classification of patients with Alzheimer’s disease from cognitively normal group demonstrates the potentials of using positron emission tomography with tau tracers as a competitive biomarker for precision medicine. Second, the segmentation of tubers in patients with tuberous sclerosis complex proves a successful 2D modeling approach in quantifying neurological burden of a rare yet deadly disease. Third, the detection of comorbid pathologies from patients with Alzheimer’s disease is analyzed and discussed in depth. Based on prior findings that comorbidities of Alzheimer’s disease affect the brain structure in a distinctive pattern, this dissertation proves for the first time the effectiveness of using deep learning on the accurate identification of comorbid pathology in vivo. Leveraging postmortem neuropathology as ground truth labels on top of the proposed methods records competitive performances in comorbidity prediction. Notably, this dissertation discovers that structural magnetic resonance imaging is a reliable biomarker in differentiating the comorbid cereberal amyloid angiopathy from Alzheimer’s disease patients. The dissertation discusses experimental findings on a wide range of neurological disorders, including tuberous sclerosis complex, dementia, and epilepsy. These results contribute to better decision-making on building neural network models for understanding and managing neurological diseases. With the thorough exploration, the dissertation may provide valuable insights that can push forward research in clinical neurology.
90

A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared Images

Siddiqui, Mohammad Faridul Haque 29 August 2019 (has links)
No description available.

Page generated in 0.111 seconds