Global ETD Search

411	Semi-Supervised Gait Recognition Mitra, Sirshapan 01 January 2024 (has links) (PDF) In this work, we examine semi-supervised learning for Gait recognition with a limited number of labeled samples. Our research focus on two distinct aspects for limited labels, 1)closed-set: with limited labeled samples per individual, and 2) open-set: with limited labeled individuals. We find open-set poses greater challenge compared to closed-set thus, having more labeled ids is important for performance than having more labeled samples per id. Moreover, obtaining labeled samples for a large number of individuals is usually more challenging, therefore limited id setup (closed-setup) is more important to study where most of the training samples belong to unknown ids. We further analyze that existing semi-supervised learning approaches are not well suited for scenario where unlabeled samples belong to novel ids. We propose a simple prototypical self-training approach to solve this problem, where, we integrate semi-supervised learning for closed set setting with self-training which can effectively utilize unlabeled samples from unknown ids. To further alleviate the challenges of limited labeled samples, we explore the role of synthetic data where we utilize diffusion model to generate samples from both known and unknown ids. We perform our experiments on two different Gait recognition benchmarks, CASIA-B and OUMVLP, and provide a comprehensive evaluation of the proposed method. The proposed approach is effective and generalizable for both closed and open-set settings. With merely 20% of labeled samples, we were able to achieve performance competitive to supervised methods utilizing 100% labeled samples while outperforming existing semi-supervised methods. Deep Learning Semi-Supervised Learning Gait Recognition Computer Sciences
412	A Machine Learning Approach to Recognize Environmental Features Associated with Social Factors Diaz-Ramos, Jonathan 11 June 2024 (has links) In this thesis we aim to supplement the Climate and Economic Justice Screening Tool (CE JST), which assists federal agencies in identifying disadvantaged census tracts, by extracting five environmental features from Google Street View (GSV) images. The five environmental features are garbage bags, greenery, and three distinct road damage types (longitudinal, transverse, and alligator cracks), which were identified using image classification, object detection, and image segmentation. We evaluate three cities using this developed feature space in order to distinguish between disadvantaged and non-disadvantaged census tracts. The results of the analysis reveal the significance of the feature space and demonstrate the time efficiency, detail, and cost-effectiveness of the proposed methodology. / Master of Science / In this thesis we aim to supplement the Climate and Economic Justice Screening Tool (CE JST), which assists federal agencies in identifying disadvantaged census tracts, by extracting five environmental features from Google Street View (GSV) images. The five environmental features are garbage bags, greenery, and three distinct road damage types (longitudinal, transverse, and alligator cracks), which were identified using image classification, object detection, and image segmentation. We evaluate three cities using this developed feature space in order to distinguish between disadvantaged and non-disadvantaged census tracts. The results of the analysis reveal the significance of the feature space and demonstrate the time efficiency, detail, and cost-effectiveness of the proposed methodology. computer vision object detection image segmentation deep learning
413	Optimizations for Deep Learning-Based CT Image Enhancement Chaturvedi, Ayush 04 March 2024 (has links) Computed tomography (CT) combined with deep learning (DL) has recently shown great potential in biomedical imaging. Complex DL models with varying architectures inspired by the human brain are improving imaging software and aiding diagnosis. However, the accuracy of these DL models heavily relies on the datasets used for training, which often contain low-quality CT images from low-dose CT (LDCT) scans. Moreover, in contrast to the neural architecture of the human brain, DL models today are dense and complex, resulting in a significant computational footprint. Therefore, in this work, we propose sparse optimizations to minimize the complexity of the DL models and leverage architecture-aware optimization to reduce the total training time of these DL models. To that end, we leverage a DL model called DenseNet and Deconvolution Network (DDNet). The model enhances LDCT chest images into high-quality (HQ) ones but requires many hours to train. To further improve the quality of final HQ images, we first modified DDNet's architecture with a more robust multi-level VGG (ML-VGG) loss function to achieve state-of-the-art CT image enhancement. However, improving the loss function results in increased computational cost. Hence, we introduce sparse optimizations to reduce the complexity of the improved DL model and then propose architecture-aware optimizations to efficiently utilize the underlying computing hardware to reduce the overall training time. Finally, we evaluate our techniques for performance and accuracy using state-of-the-art hardware resources. / Master of Science / Deep learning-based (DL) techniques that leverage computed tomography (CT) are becoming omnipresent in diagnosing diseases and abnormalities associated with different parts of the human body. However, their diagnostic accuracy is directly proportional to the quality of the CT images used in training the DL models, which is majorly governed by the radiation dose of the X-ray in the CT scanner. To improve the quality of low-dose CT (LDCT) images, DL-based techniques show promising improvements. However, these techniques require substantial computational resources and time to train the DL models. Therefore, in this work, we incorporate algorithmic techniques inspired by sparse neural architecture of the human brain to reduce the complexity of such DL models. To that end, we leverage a DL model called DenseNet and Deconvolution Network (DDNet) that enhances the quality of CT images generated by low X-ray dosage into high-quality CT images. However, due to its architecture, it takes hours to train DDNet on state-of-the-art hardware resources. Hence, in this work, we propose techniques that efficiently utilize the hardware resources and reduce the time required to train DDNet. We evaluate the efficacy of our techniques on modern supercomputers in terms of speed and accuracy. Biomedical imaging COVID-19 deep learning HPC neural networks GPU
414	Gated Transformer-Based Architecture for Automatic Modulation Classification Sahu, Antorip 05 February 2024 (has links) This thesis delves into the advancement of 5G portable test-nodes in wireless communication systems with cognitive radio capabilities, specifically addressing the critical need for dynamic spectrum sensing and awareness at the radio receiver through AI-driven automatic modulation classification. Our methodology is centered around the transformer encoder architecture incorporating a multi-head self-attention mechanism. We train our architecture extensively across a diverse range of signal-to-noise ratios (SNRs) from the RadioML 2018.01A dataset. We introduce a novel transformer-based architecture with a gated mechanism, designed as a runtime re-configurable automatic modulation classification framework, which demonstrates enhanced performance with low SNR RF signals during evaluation, an area where conventional methods have shown limitations, as corroborated by existing research. Our innovative single-model framework employs distinct weight sets, activated by varying SNR levels, to enable a gating mechanism for more accurate modulation classification. This advancement in automatic modulation classification marks a crucial step toward the evolution of smarter communication systems. / Master of Science / This thesis delves into the advancement of wireless communication systems, particularly in developing portable devices capable of effectively detecting and analyzing radio signals with cognitive radio capabilities. Central to our research is leveraging artificial intelligence (AI) for automatic modulation classification, a method to identify signal modulation types. We utilize a transformer-based AI model trained on the RadioML 2018.01A dataset. Our training approach is particularly effective when evaluating low-quality signals using a gating mechanism based on signal-to-noise ratios, an area previously considered challenging in existing research. This work marks a significant advancement in creating more intelligent and responsive wireless communication systems. artificial intelligence deep learning neural networks transformer automatic modulation classification
415	Consumer-Centric Innovation for Mobile Apps Empowered by Social Media Analytics Qiao, Zhilei 20 June 2018 (has links) Due to the rapid development of Internet communication technologies (ICTs), an increasing number of social media platforms exist where consumers can exchange comments online about products and services that businesses offer. The existing literature has demonstrated that online user-generated content can significantly influence consumer behavior and increase sales. However, its impact on organizational operations has been primarily focused on marketing, with other areas understudied. Hence, there is a pressing need to design a research framework that explores the impact of online user-generated content on important organizational operations such as product innovation, customer relationship management, and operations management. Research efforts in this dissertation center on exploring the co-creation value of online consumer reviews, where consumers' demands influence firms' decision-making. The dissertation is composed of three studies. The first study finds empirical evidence that quality signals in online product reviews are predictors of the timing of firms' incremental innovation. Guided by the product differentiation theory, the second study examines how companies' innovation and marketing differentiation strategies influence app performance. The last study proposes a novel text analytics framework to discover different information types from user reviews. The research contributes theoretical and practical insights to consumer-centric innovation and social media analytics literature. / PHD / The IT industry, and especially the mobile application (app) market, is intensively competitive and propelled by rapid innovation. The number of apps downloaded worldwide is 102,062 million, generating $88.3 billion in revenue, and projections suggest this will rise to $189 billion in 2020. Hence, there is an impetus to examine competition strategies of app makers to better understand how this important market functions. The app update is an important competitive strategy. The first study investigates what types of public information from both customers and app makers can be used to predict app makers’ updating decisions. The findings indicate customer provided information impacts app makers’ updating decisions. Hence, the study provides insights into the importance of customer-centric strategy to market players. In the second study, it explores the impacts of product differentiation strategies on app product performance in the mobile app marketplace. The results indicate that product updates, which the first study showed are influenced by consumer feedback, are a vertical product differentiation strategy that impacts app performance. Therefore, the results from the two studies illustrate the importance of integrating online customer feedback into companies’ technology strategy. Finally, the third study proposes a novel framework that applies a domain-adapted deep learning approach to categorizing and summarizing two types of innovation opportunities (i.e., feature requests) embedded in app reviews. The results show that the proposed classification approach outperforms traditional algorithms. innovation mobile apps social media online reviews competition deep learning
416	Digital Phenotyping and Genomic Prediction Using Machine and Deep Learning in Animals and Plants Bi, Ye 03 October 2024 (has links) This dissertation investigates the utility of deep learning and machine learning approaches for livestock management and quantitative genetic modeling of rice grain size under climate change. Monitoring the live body weight of animals is crucial to support farm management decisions due to its direct relationship with animal growth, nutritional status, and health. However, conventional manual weighing methods are time consuming and can cause potential stress to animals. While there is a growing trend towards the use of three-dimensional cameras coupled with computer vision techniques to predict animal body weight, their validation with deep learning models as well as large-scale data collected in commercial environments is still limited. Therefore, the first two research chapters show how deep learning-based computer vision systems can enable accurate live body weight prediction for dairy cattle and pigs. These studies also address the challenges of managing large, complex phenotypic data and highlight the potential of deep learning models to automate data processing and improve prediction accuracy in an industry-scale commercial setting. The dissertation then shifts the focus to crop resilience, particularly in rice, where the asymmetric increase in average nighttime temperatures relative to the increase in average daytime temperatures due to climate change is reducing grain yield and quality in rice. Through the use of deep learning and machine learning models, the last two chapters explore how metabolic data can be used in quantitative genetic modeling in rice under environmental stress conditions such as high night temperatures. These studies showed that the integration of metabolites and genomics provided an improvement in the prediction of rice grain size-related traits, and certain metabolites were identified as potential candidates for improving multi-trait genomic prediction. Further research showed that metabolic accumulation was low to moderately heritable, and genomic prediction accuracies were consistent with expected genomic heritability estimates. Genomic correlations between control and high night temperature conditions indicated genotype-by-environment interactions in metabolic accumulation and the effectiveness of genomic prediction models for metabolic accumulation varied across metabolites. Joint analysis of multiple metabolites improved the accuracy of genomic prediction by exploiting correlations between metabolite accumulation. Overall, this dissertation highlights the potential of integrating digital technologies and multi-omic data to advance data analytics in agriculture, with applications in livestock management and quantitative genetic modeling of rice. / Doctor of Philosophy / This dissertation explores the application of deep learning and machine learning to computer vision-based livestock management and quantitative genetic modeling of rice grain size under climate change. The first half of the research chapters illustrate how computer vision systems can enable digital phenotyping of dairy cows and pigs, which is critical for informed management decisions and quantitative genetic analysis. These studies address the challenges of managing large-scale, complex phenotypic data and highlight the potential of deep learning models to automate data processing and improve prediction accuracy. Chapter 3 showed that a deep learning-based segmentation, Mask R-CNN, improved the prediction performance of cow body weight from longitudinal depth video data. Among the image features, volume followed by width correlated best with body weight. Chapter 4 showed that efficient deep learning-based supervised learning models are a promising approach for predicting pig body weight from industry-scale depth video data. Although the sparse design, which simulates budget and time constraints by using a subset of the data for training, resulted in some performance loss compared to the full design, the Vision Transformer models effectively mitigated this loss. The second half of the research chapters focuses on integrating metabolomic and genomic data to predict grain traits and metabolic content in rice under climate change. Through the use of machine learning models, these studies investigate how combining genomic and metabolic data can improve predictions, particularly under high night temperature stress in rice. Chapter 5 showed that the integration of metabolites and genomics improved the prediction of rice grain size-related traits, and certain metabolites were identified as potential candidates for improving multi-trait genomic prediction. Chapter 6 showed that metabolic accumulation was low to moderately heritable. Genomic correlations between control and high night temperature conditions indicated genotype-by-environment interactions in metabolic accumulation, and the effectiveness of genomic prediction models for metabolic accumulation varied across metabolites. Joint analysis of multiple metabolites improved the accuracy of genomic prediction by exploiting correlations between metabolite accumulation. Overall, the dissertation provides insight into how cutting-edge methods can be used to improve livestock management and multi-omic quantitative genetic modeling for breeding. deep learning digital phenotyping genomic analysis machine learning
417	Enhancing Surgical Gesture Recognition Using Bidirectional LSTM and Evolutionary Computation: A Machine Learning Approach to Improving Robotic-Assisted Surgery / BiLSTM and Evolutionary Computation for Surgical Gesture Recognition Zhang, Yifei January 2024 (has links) The integration of artificial intelligence (AI) and machine learning in the medical field has led to significant advancements in surgical robotics, particularly in enhancing the precision and efficiency of surgical procedures. This thesis investigates the application of a single-layer bidirectional Long Short-Term Memory (BiLSTM) model to the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset, aiming to improve the recognition and classification of surgical gestures. The BiLSTM model, with its capability to process data in both forward and backward directions, offers a comprehensive analysis of temporal sequences, capturing intricate patterns within surgical motion data. This research explores the potential of BiLSTM models to outperform traditional unidirectional models in the context of robotic surgery. In addition to the core model development, this study employs evolutionary computation techniques for hyperparameter tuning, systematically searching for optimal configurations to enhance model performance. The evaluation metrics include training and validation loss, accuracy, confusion matrices, prediction time, and model size. The results demonstrate that the BiLSTM model with evolutionary hyperparameter tuning achieves superior performance in recognizing surgical gestures compared to standard LSTM models. The findings of this thesis contribute to the broader field of surgical robotics and human-AI partnership by providing a robust method for accurate gesture recognition, which is crucial for assessing and training surgeons and advancing automated and assistive technologies in surgical procedures. The improved model performance underscores the importance of sophisticated hyperparameter optimization in developing high-performing deep learning models for complex sequential data analysis. / Thesis / Master of Applied Science (MASc) / Advancements in artificial intelligence (AI) are transforming medicine, particularly in robotic surgery. This thesis focuses on improving how robots recognize and classify surgeons' movements during operations. Using a special AI model called a bidirectional Long Short-Term Memory (BiLSTM) network, which looks at data both forwards and backwards, the study aims to better understand and predict surgical gestures. By applying this model to a dataset of surgical tasks, specifically suturing, and optimizing its settings with advanced techniques, the research shows significant improvements in accuracy and efficiency over traditional methods. The enhanced model is not only more accurate but also smaller and faster. These improvements can help train surgeons more effectively and advance robotic assistance in surgeries, leading to safer and more precise operations, ultimately benefiting both surgeons and patients. Machine Learning Deep Learning Surgical Gesture Recognition Evolutionary Computation
418	Deep Autofocusing for Digital Pathology Whole Slide Imaging Li, Qiang January 2024 (has links) The quality of clinical pathology is a critical index for evaluating a nation's healthcare level. Recently developed digital pathology techniques have the capability to transform pathological slides into digital whole slide images (WSI). This transformation facilitates data storage, online transmission, real-time viewing, and remote consultations, significantly elevating clinical diagnosis. The effectiveness and efficiency of digital pathology imaging often hinge on the precision and speed of autofocusing. However, achieving autofocusing of pathological images presents challenges under constraints including uneven focus distribution and limited Depth of Field (DoF). Current autofocusing methods, such as those relying on image stacks, need to use more time and resources for capturing and processing images. Moreover, autofocusing based on reflective hardware systems, despite its efficiency, incurs significant hardware costs and suffers from a lack of system compatibility. Finally, machine learning-based autofocusing can circumvent repetitive mechanical movements and camera shots. However, a simplistic end-to-end implementation that does not account for the imaging process falls short of delivering satisfactory focus prediction and in-focus image restoration. In this thesis, we present three distinct autofocusing techniques for defocus pathology images: (1) Aberration-aware Focal Distance Prediction leverages the asymmetric effects of optical aberrations, making it ideal for focus prediction within focus map scenarios; (2) Dual-shot Deep Autofocusing with a Fixed Offset Prior is designed to merge two images taken at different defocus distances with fixed positions, ensuring heightened accuracy in in-focus image restoration for fast offline situations; (3) Semi-blind Deep Restoration of Defocus Images utilizes multi-task joint prediction guided by PSF, enabling high-efficiency, single-pass scanning for offline in-focus image restoration. / Thesis / Doctor of Philosophy (PhD)
419	Think outside the Black Box: Model-Agnostic Deep Learning with Domain Knowledge / Think outside the Black Box: Modellagnostisches Deep Learning mit Domänenwissen Kobs, Konstantin January 2024 (has links) (PDF) Deep Learning (DL) models are trained on a downstream task by feeding (potentially preprocessed) input data through a trainable Neural Network (NN) and updating its parameters to minimize the loss function between the predicted and the desired output. While this general framework has mainly remained unchanged over the years, the architectures of the trainable models have greatly evolved. Even though it is undoubtedly important to choose the right architecture, we argue that it is also beneficial to develop methods that address other components of the training process. We hypothesize that utilizing domain knowledge can be helpful to improve DL models in terms of performance and/or efficiency. Such model-agnostic methods can be applied to any existing or future architecture. Furthermore, the black box nature of DL models motivates the development of techniques to understand their inner workings. Considering the rapid advancement of DL architectures, it is again crucial to develop model-agnostic methods. In this thesis, we explore six principles that incorporate domain knowledge to understand or improve models. They are applied either on the input or output side of the trainable model. Each principle is applied to at least two DL tasks, leading to task-specific implementations. To understand DL models, we propose to use Generated Input Data coming from a controllable generation process requiring knowledge about the data properties. This way, we can understand the model’s behavior by analyzing how it changes when one specific high-level input feature changes in the generated data. On the output side, Gradient-Based Attribution methods create a gradient at the end of the NN and then propagate it back to the input, indicating which low-level input features have a large influence on the model’s prediction. The resulting input features can be interpreted by humans using domain knowledge. To improve the trainable model in terms of downstream performance, data and compute efficiency, or robustness to unwanted features, we explore principles that each address one of the training components besides the trainable model. Input Masking and Augmentation directly modifies the training input data, integrating knowledge about the data and its impact on the model’s output. We also explore the use of Feature Extraction using Pretrained Multimodal Models which can be seen as a beneficial preprocessing step to extract useful features. When no training data is available for the downstream task, using such features and domain knowledge expressed in other modalities can result in a Zero-Shot Learning (ZSL) setting, completely eliminating the trainable model. The Weak Label Generation principle produces new desired outputs using knowledge about the labels, giving either a good pretraining or even exclusive training dataset to solve the downstream task. Finally, improving and choosing the right Loss Function is another principle we explore in this thesis. Here, we enrich existing loss functions with knowledge about label interactions or utilize and combine multiple task-specific loss functions in a multitask setting. We apply the principles to classification, regression, and representation tasks as well as to image and text modalities. We propose, apply, and evaluate existing and novel methods to understand and improve the model. Overall, this thesis introduces and evaluates methods that complement the development and choice of DL model architectures. / Deep-Learning-Modelle (DL-Modelle) werden trainiert, indem potenziell vorverarbeitete Eingangsdaten durch ein trainierbares Neuronales Netz (NN) geleitet und dessen Parameter aktualisiert werden, um die Verlustfunktion zwischen der Vorhersage und der gewünschten Ausgabe zu minimieren. Während sich dieser allgemeine Ablauf kaum geändert hat, haben sich die verwendeten NN-Architekturen erheblich weiterentwickelt. Auch wenn die Wahl der Architektur für die Aufgabe zweifellos wichtig ist, schlagen wir in dieser Arbeit vor, Methoden für andere Komponenten des Trainingsprozesses zu entwickeln. Wir vermuten, dass die Verwendung von Domänenwissen hilfreich bei der Verbesserung von DL-Modellen bezüglich ihrer Leistung und/oder Effizienz sein kann. Solche modellagnostischen Methoden sind dann bei jeder bestehenden oder zukünftigen NN-Architektur anwendbar. Die Black-Box-Natur von DL-Modellen motiviert zudem die Entwicklung von Methoden, die zum Verständnis der Funktionsweise dieser Modelle beitragen. Angesichts der schnellen Architektur-Entwicklung ist es wichtig, modellagnostische Methoden zu entwickeln. In dieser Arbeit untersuchen wir sechs Prinzipien, die Domänenwissen verwenden, um Modelle zu verstehen oder zu verbessern. Sie werden auf Trainingskomponenten im Eingang oder Ausgang des Modells angewendet. Jedes Prinzip wird dann auf mindestens zwei DL-Aufgaben angewandt, was zu aufgabenspezifischen Implementierungen führt. Um DL-Modelle zu verstehen, verwenden wir kontrolliert generierte Eingangsdaten, was Wissen über die Dateneigenschaften benötigt. So können wir das Verhalten des Modells verstehen, indem wir die Ausgabeänderung bei der Änderung von abstrahierten Eingabefeatures beobachten. Wir untersuchen zudem gradienten-basierte Attribution-Methoden, die am Ausgang des NN einen Gradienten anlegen und zur Eingabe zurückführen. Eingabefeatures mit großem Einfluss auf die Modellvorhersage können so identifiziert und von Menschen mit Domänenwissen interpretiert werden. Um Modelle zu verbessern (in Bezug auf die Ergebnisgüte, Daten- und Recheneffizienz oder Robustheit gegenüber ungewollten Eingaben), untersuchen wir Prinzipien, die jeweils eine Trainingskomponente neben dem trainierbaren Modell betreffen. Das Maskieren und Augmentieren von Eingangsdaten modifiziert direkt die Trainingsdaten und integriert dabei Wissen über ihren Einfluss auf die Modellausgabe. Die Verwendung von vortrainierten multimodalen Modellen zur Featureextraktion kann als ein Vorverarbeitungsschritt angesehen werden. Bei fehlenden Trainingsdaten können die Features und Domänenwissen in anderen Modalitäten als Zero-Shot Setting das trainierbare Modell gänzlich eliminieren. Das Weak-Label-Generierungs-Prinzip erzeugt neue gewünschte Ausgaben anhand von Wissen über die Labels, was zu einem Pretrainings- oder exklusiven Trainigsdatensatz führt. Schließlich ist die Verbesserung und Auswahl der Verlustfunktion ein weiteres untersuchtes Prinzip. Hier reichern wir bestehende Verlustfunktionen mit Wissen über Label-Interaktionen an oder kombinieren mehrere aufgabenspezifische Verlustfunktionen als Multi-Task-Ansatz. Wir wenden die Prinzipien auf Klassifikations-, Regressions- und Repräsentationsaufgaben sowie Bild- und Textmodalitäten an. Wir stellen bestehende und neue Methoden vor, wenden sie an und evaluieren sie für das Verstehen und Verbessern von DL-Modellen, was die Entwicklung und Auswahl von DL-Modellarchitekturen ergänzt. Deep learning Neuronales Netz Maschinelles Lernen ddc:000
420	Optimizing Neural Network Models for Healthcare and Federated Learning Verardo, Giacomo January 2024 (has links) Neural networks (NN) have demonstrated considerable capabilities in tackling tasks in a diverse set of fields, including natural language processing, image classification, and regression. In recent years, the amount of available data to train Deep Learning (DL) models has increased tremendously, thus requiring larger and larger models to learn the underlying patterns in the data. Inference time, communication cost in the distributed case, required storage resources, and computational capabilities have increased proportional to the model's size, thus making NNs less suitable for two cases: i) tasks requiring low inference time (e.g., real-time monitoring) and ii) training on low powered devices. These two cases, which have become crucial in the last decade due to the pervasiveness of low-powered devices and NN models, are addressed in this licentiate thesis. As the first contribution, we analyze the distributed case with multiple low-powered devices in a federated scenario. Cross-device Federated Learning (FL) is a branch of Machine Learning (ML) where multiple participants train a common global model without sharing data in a centralized location. In this thesis, a novel technique named Coded Federated Dropout (CFD) is proposed to carefully split the global model into sub-models, thus increasing communication efficiency and reducing the burden on the devices with only a slight increase in training time. We showcase our results for an example image classification task. As the second contribution, we consider the anomaly detection task on Electrocardiogram (ECG) recordings and show that including prior knowledge in NNs models drastically reduces model size, inference time, and storage resources for multiple state-of-the-art NNs. In particular, this thesis focuses on AEs, a subclass of NNs, which is suitable for anomaly detection. I propose a novel approach, called FMM-Head, which incorporates basic knowledge of the ECG waveform shape into an AE. The evaluation shows that we improve the AUROC of baseline models while guaranteeing under-100ms inference time, thus enabling real-time monitoring of ECG recordings from hospitalized patients. Finally, several potential future works are presented. The inclusion of prior knowledge can be further exploited in the ECG Imaging (ECGI) case, where hundreds of ECG sensors are used to reconstruct the 3D electrical activity of the heart. For ECGI, the reduction in the number of sensors employed (i.e., the input space) is also beneficial in terms of reducing model size. Moreover, this thesis advocates additional techniques to integrate ECG anomaly detection in a distributed and federated case. / Neurala nätverk (NN) har visat god förmåga att tackla uppgifter inom en mängd olika områden, inklusive Natural Language Processing (NLP), bildklassificering och regression. Under de senaste åren har mängden tillgänglig data för att träna Deep Learning (DL)-modeller ökat enormt, vilket kräver större och större modeller för att lära sig de underliggande mönstren i datan. Inferens tid och kommunikationskostnad i det distribuerade fallet, nödvändiga lagringsresurser och beräkningskapacitet har ökat proportionerligt mot modellens storlek vilket gör NN mindre lämpliga använda i två fall: (i) uppgifter som kräver snabba slutledningar (t.ex. realtidsövervakning) och (ii) användning på mindre kraftfulla enheter. De här två fallen, som har blivit mer förekommande under det senaste decenniet på grund av omfattningen av mindre kraftfulla enheter och NN-modeller, behandlas i denna licentiatuppsats. Som det första bidraget analyserar vi det distribuerade fallet med flera lättdrivna enheter i ett federerat scenario. Cross-device Federated Learning (FL) är en gren av Machine Learning (ML) där flera deltagare tränar en gemensam global modell utan att dela data på en centraliserad plats. I denna avhandling föreslås en nyteknik, Coded Federated Dropout (CFD), som delar upp den globala modellen i undermodeller, vilket ökar kommunikationseffektiviteten och samtidigt minskar belastningen på enheterna. Detta erhålls med endast en liten förlängning av träningstiden. Vi delger våra resultat för en exempeluppgift för bildklassificering. Som det andra bidraget betraktar vi anomalidetekteringsuppgiften Elektrokardiogram (EKG)-registrering och visar att inklusionen av förkunskaper i NN-modeller drastiskt minskar modellstorlek, inferenstider och lagringsresurser för flera moderna NN. Speciellt fokuserar denna avhandling på Autoencoders (AEs), en delmängd av NN, lämplig för avvikelsedetektering. En ny metod, kallad FMM-Head, föreslås. vilken omformar grundläggande kunskaper om EKG-vågformen till en AE. Utvärderingen visar att vi förbättrar arean under kurvan (AUROC) för baslinjemodeller samtidigt som vi garanterar under 100 ms inferenstid, vilket möjliggör realtidsövervakning av EKG-inspelningar från inlagda patienter. Slutligen presenteras flera potentiella framtida utvidgningar. Införandet av tidigare kunskap kan utnyttjas ytterligare i fallet med EKG Imaging (ECGI), där hundratals EKG-sensorer används för att rekonstruera den elektriska 3D-aktiviteten hos hjärtat. För ECGI är minskningen av antalet använda sensorer (dvs inmatningsutrymme) också fördelaktig när det gäller att minska modellstorleken. Dessutom förespråkas i denna avhandling ytterligare tekniker för att integrera EKG-avvikelsedetektering i distribuerade och federerade fall. / <p>This research leading to this thesis is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No. ORA-CRG2021-4699</p> Deep Learning Autoencoders Federated Learning Computer Sciences Datavetenskap (datalogi)

Search results