Spelling suggestions: "subject:"btransfer learning"" "subject:"cotransfer learning""
131 |
Transfer Learning and Hyperparameter Optimisation with Convolutional Neural Networks for Fashion Style Classification and Image RetrievalAlishev, Andrey January 2024 (has links)
The thesis explores the application of Convolutional Neural Networks (CNNs) in the fashion industry, focusing on fashion style classification and image retrieval. Employing transfer learning, the study investigates the effectiveness of fine-tuning pre-trained CNN models to adapt them for a specific fashion recognition task by initially performing an extensive hyperparameter optimisation, utilising the Optuna framework. The impact of dataset size on model performance was examined by comparing the accuracy of models trained on datasets containing 2000 and 8000 images. Results indicate that larger datasets significantly improve model performance, particularly for more complex models like EfficientNetV2S, which showed the best overall performance with an accuracy of 85.38% on the larger dataset after fine-tuning. The best-performing and fine-tuned model was subsequently used for image retrieval as features were extracted from the last convolutional layer. These features were used in a cosine similarity measure to rank images by their similarity to a query image. This technique achieved a mean average precision (mAP) of 0.4525, indicating that CNNs hold promise for enhancing fashion retrieval systems, although further improvements and validations are necessary. Overall, this research highlights the versatility of CNNs in interpreting and categorizing complex visual data. The importance of well-prepared, targeted data and refined model training strategies is highlighted to enhance the accuracy and applicability of AI in diverse fields.
|
132 |
Addressing Challenges of Modern News Agencies via Predictive Modeling, Deep Learning, and Transfer LearningKeneshloo, Yaser 22 July 2019 (has links)
Today's news agencies are moving from traditional journalism, where publishing just a few news articles per day was sufficient, to modern content generation mechanisms, which create more than thousands of news pieces every day.
With the growth of these modern news agencies comes the arduous task of properly handling this massive amount of data that is generated for each news article.
Therefore, news agencies are constantly seeking solutions to facilitate and automate some of the tasks that have been previously done by humans.
In this dissertation, we focus on some of these problems and provide solutions for two broad problems which help a news agency to not only have a wider view of the behaviour of readers around the article but also to provide an automated tools to ease the job of editors in summarizing news articles.
These two disjoint problems are aiming at improving the users' reading experience by helping the content generator to monitor and focus on poorly performing content while allow them to promote the good-performing ones.
We first focus on the task of popularity prediction of news articles via a combination of regression, classification, and clustering models.
We next focus on the problem of generating automated text summaries for a long news article using deep learning models.
The first problem aims at helping the content developer in understanding of how a news article is performing over the long run while the second problem provides automated tools for the content developers to generate summaries for each news article. / Doctor of Philosophy / Nowadays, each person is exposed to an immense amount of information from social media, blog posts, and online news portals. Among these sources, news agencies are one of the main content providers for each person around the world. Contemporary news agencies are moving from traditional journalism to modern techniques from different angles. This is achieved either by building smart tools to track the behaviour of readers’ reaction around a specific news article or providing automated tools to facilitate the editor’s job in providing higher quality content to readers. These systems should not only be able to scale well with the growth of readers but also they have to be able to process ad-hoc requests, precisely since most of the policies and decisions in these agencies are taken around the result of these analytical tools. As part of this new movement towards adapting new technologies for smart journalism, we have worked on various problems with The Washington Post news agency on building tools for predicting the popularity of a news article and automated text summarization model. We develop a model that monitors each news article after its publication and provide prediction over the number of views that this article will receive within the next 24 hours. This model will help the content creator to not only promote potential viral article in the main page of the web portal or social media, but also provide intuition for editors on potential poorly performing articles so that they can edit the content of those articles for better exposure. On the other hand, current news agencies are generating more than a thousands news articles per day and generating three to four summary sentences for each of these news pieces not only become infeasible in the near future but also very expensive and time-consuming. Therefore, we also develop a separate model for automated text summarization which generates summary sentences for a news article. Our model will generate summaries by selecting the most salient sentence in the news article and paraphrase them to shorter sentences that could represent as a summary sentence for the entire document.
|
133 |
A 3D Deep Learning Architecture for Denoising Low-Dose CT ScansKasparian, Armen Caspar 11 April 2024 (has links)
This paper introduces 3D-DDnet, a cutting-edge 3D deep learning (DL) framework designed to improve the image quality of low-dose computed tomography (LDCT) scans. Although LDCT scans are advantageous for reducing radiation exposure, they inherently suffer from reduced image quality. Our novel 3D DL architecture addresses this issue by effectively enhancing LDCT images to achieve parity with the quality of standard-dose CT scans. By exploiting the inter-slice correlation present in volumetric CT data, 3D-DDnet surpasses existing denoising benchmarks. It incorporates distributed data parallel (DDP) and transfer learning techniques to significantly accelerate the training process. The DDP approach is particularly tailored for operation across multiple Nvidia A100 GPUs, facilitating the processing of large-scale volumetric data sets that were previously unmanageable due to size constraints. Comparative analyses demonstrate that 3D-DDnet reduces the mean square error (MSE) by 10% over its 2D counterpart, 2D-DDnet. Moreover, by applying transfer learning from pre-trained 2D models, 3D-DDnet effectively 'jump starts' the learning process, cutting training times by half without compromising on model accuracy. / Master of Science / This research focuses on improving the quality of low-dose CT scans using advanced technology. CT scans are medical imaging techniques used to see inside the body. Low-dose CT (LDCT) scans use less radiation than standard CT scans, making them safer, but the downside is that the images are not as clear. To solve this problem, we developed a new deep learning method to make these low-dose images clearer and as good as regular CT scans.
Our approach, called 3D-DDnet, is unique because it looks at the scans in 3D, considering how slices of the scan are related, which helps remove the noise and improve the image quality. Additionally, we used a technique called distributed data parallel (DDP) with advanced GPUs (graphics processing units, which are powerful computer components) to speed up the training of our system. This means our method can learn to improve images faster and work with larger data sets than before. Our results are promising: 3D-DDnet improved the image quality of low-dose CT scans significantly better than previous methods. Also, by using what we call "transfer learning" (starting with a pre-made model and adapting it), we cut the training time in half without losing accuracy. This development is essential for making low-dose CT scans more effective and safer for patients.
|
134 |
Achieving More with Less: Learning Generalizable Neural Networks With Less Labeled Data and Computational OverheadsBu, Jie 15 March 2023 (has links)
Recent advancements in deep learning have demonstrated its incredible ability to learn generalizable patterns and relationships automatically from data in a number of mainstream applications. However, the generalization power of deep learning methods largely comes at the costs of working with very large datasets and using highly compute-intensive models. Many applications cannot afford these costs needed to ensure generalizability of deep learning models. For instance, obtaining labeled data can be costly in scientific applications, and using large models may not be feasible in resource-constrained environments involving portable devices. This dissertation aims to improve efficiency in machine learning by exploring different ways to learn generalizable neural networks that require less labeled data and computational resources. We demonstrate that using physics supervision in scientific problems can reduce the need for labeled data, thereby improving data efficiency without compromising model generalizability. Additionally, we investigate the potential of transfer learning powered by transformers in scientific applications as a promising direction for further improving data efficiency. On the computational efficiency side, we present two efforts for increasing parameter efficiency of neural networks through novel architectures and structured network pruning. / Doctor of Philosophy / Deep learning is a powerful technique that can help us solve complex problems, but it often requires a lot of data and resources. This research aims to make deep learning more efficient, so it can be applied in more situations. We propose ways to make the deep learning models require less data and less computer power. For example, we leverage the physics rules as additional information for training the neural network to learn from less labeled data and we use a technique called transfer learning to leverage knowledge from data that is from other distribution. Transfer learning may allow us to further reduce the need for labeled data in scientific applications. We also look at ways to make the deep learning models use less computational resources, by effectively reducing their sizes via novel architectures or pruning out redundant structures.
|
135 |
Enabling EMG-based Silent Speech Transcription Through Speech-To-Text Transfer LearningGarcia, Alexander T 01 September 2024 (has links) (PDF)
In recent years, advances in deep learning have allowed various forms of electrographic signals, such as electroencephalography (EEG) and electromyography (EMG), to be used as a viable form of input in artificial intelligence applications, particularly for applications in the medical field. One such topic that EMG inputs have been used is in silent speech interfaces, or devices capable of processing speech without an audio-based input. The goal of this thesis is to explore a novel method of training a machine learning model to be used for silent speech interface development: using transfer learning to leverage a pre-trained speech recognition model for classifying EMG-based silent speech inputs.
To accomplish this, we pass labeled EMG data through a custom transformation process, turning the data into musical notes that represent changes in an EMG sensor as silent speech data is captured. This transformed data was used as input into a pre-trained speech recognition model, and the model's classification layers were retrained to better fit the incoming data. The custom transformation process and model demonstrated progress towards effective classification with a small, closed-vocabulary dataset but showed no signs of effective training with a larger, open-vocabulary dataset. The effectiveness on the small closed-vocabulary dataset demonstrate that training a model to recognize EMG data using transfer learning on a pre-trained speech to text model is a viable approach.
|
136 |
Non-linguistic Notions in Language Modeling: Learning, Retention, and ApplicationsSharma, Mandar 11 September 2024 (has links)
Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling. / Doctor of Philosophy / Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.
|
137 |
Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor FusionWale, Prajakta Nitin 08 August 2024 (has links)
This thesis investigates the integration of transfer learning with ResNet-101 and compares its performance with VGG-19 for 3D object detection in autonomous vehicles. ResNet-101 is a deep Convolutional Neural Network with 101 layers and VGG-19 is a one with 19 layers. The research emphasizes the fusion of camera and lidar outputs to enhance the accuracy of 3D bounding box estimation, which is critical in occluded environments. Selecting an appropriate backbone for feature extraction is pivotal for achieving high detection accuracy. To address this challenge, we propose a method leveraging transfer learning with ResNet- 101, pretrained on large-scale image datasets, to improve feature extraction capabilities. The averaging technique is used on output of these sensors to get the final bounding box. The experimental results demonstrate that the ResNet-101 based model outperforms the VGG-19 based model in terms of accuracy and robustness. This study provides valuable insights into the effectiveness of transfer learning and multi-sensor fusion in advancing the innovation in 3D object detection for autonomous driving. / Master of Science / In the realm of computer vision, the quest for more accurate and robust 3D object detection pipelines remains an ongoing pursuit. This thesis investigates advanced techniques to im- prove 3D object detection by comparing two popular deep learning models, ResNet-101 and VGG-19. The study focuses on enhancing detection accuracy by combining the outputs from two distinct methods: one that uses a monocular camera to estimate 3D bounding boxes and another that employs lidar's bird's-eye view (BEV) data, converting it to image-based 3D bounding boxes. This fusion of outputs is critical in environments where objects may be partially obscured. By leveraging transfer learning, a method where models that are pre-trained on bigger datasets are finetuned for certain application, the research shows that ResNet-101 significantly outperforms VGG-19 in terms of accuracy and robustness. The approach involves averaging the outputs from both methods to refine the final 3D bound- ing box estimation. This work highlights the effectiveness of combining different detection methodologies and using advanced machine learning techniques to advance 3D object detec- tion technology.
|
138 |
Building reliable machine learning systems for neuroscienceBuchanan, Estefany Kelly January 2024 (has links)
Neuroscience as a field is collecting more data than at any other time in history. The scale of this data allows us to ask fundamental questions about the mechanisms of brain function, the basis of behavior, and the development of disorders. Our ambitious goals as well as the abundance of data being recorded call for reproducible, reliable, and accessible systems to push the field forward. While we have made great strides in building reproducible and accessible machine learning (ML) systems for neuroscience, reliability remains a major issue.
In this dissertation, we show that we can leverage existing data and domain expert knowledge to build more reliable ML systems to study animal behavior. First, we consider animal pose estimation, a crucial component in many scientific investigations. Typical transfer learning ML methods for behavioral tracking treat each video frame and object to be tracked independently. We improve on this by leveraging the rich spatial and temporal structures pervasive in behavioral videos. Our resulting weakly supervised models achieve significantly more robust tracking. Our tools allow us to achieve improved results when we have imperfect, limited data while requiring users to label fewer training frames and speeding up training. We can more accurately process raw video data and learn interpretable units of behavior. In turn, these improvements enhance performance on downstream applications.
Next, we consider a ubiquitous approach to (attempt to) improve the reliability of ML methods, namely combining the predictions of multiple models, also known as deep ensembling. Ensembles of classical ML predictors, such as random forests, improve metrics such as accuracy by well-understood mechanisms such as improving diversity. However, in the case of deep ensembles, there is an open methodological question as to whether, given the choice between a deep ensemble and a single neural network with similar accuracy, one model is truly preferable over the other. Via careful experiments across a range of benchmark datasets and deep learning models, we demonstrate limitations to the purported benefits of deep ensembles. Our results challenge common assumptions regarding the effectiveness of deep ensembles and the “diversity” principles underpinning their success, especially with regards to important metrics for reliability, such as out-of-distribution (OOD) performance and effective robustness. We conduct additional studies of the effects of using deep ensembles when certain groups in the dataset are underrepresented (so-called “long tail” data), a setting whose importance in neuroscience applications is revealed by our aforementioned work.
Altogether, our results demonstrate the essential importance of both holistic systems work and fundamental methodological work to understand the best ways to apply the benefits of modern machine learning to the unique challenges of neuroscience data analysis pipelines. To conclude the dissertation, we outline challenges and opportunities in building next-generation ML systems.
|
139 |
Online Unsupervised Domain Adaptation / Online-övervakad domänanpassningPanagiotakopoulos, Theodoros January 2022 (has links)
Deep Learning models have seen great application in demanding tasks such as machine translation and autonomous driving. However, building such models has proved challenging, both from a computational perspective and due to the requirement of a plethora of annotated data. Moreover, when challenged on new situations or data distributions (target domain), those models may perform inadequately. Such examples are transitioning from one city to another, different weather situations, or changes in sunlight. Unsupervised Domain adaptation (UDA) exploits unlabelled data (easy access) to adapt models to new conditions or data distributions. Inspired by the fact that environmental changes happen gradually, we focus on Online UDA. Instead of directly adjusting a model to a demanding condition, we constantly perform minor adaptions to every slight change in the data, creating a soft transition from the current domain to the target one. To perform gradual adaptation, we utilized state-of-the-art semantic segmentation approaches on increasing rain intensities (25, 50, 75, 100, and 200mm of rain). We demonstrate that deep learning models can adapt substantially better to hard domains when exploiting intermediate ones. Moreover, we introduce a model switching mechanism that allows adjusting back to the source domain, after adaptation, without dropping performance. / Deep Learning-modeller har sett stor tillämpning i krävande uppgifter som maskinöversättning och autonom körning. Att bygga sådana modeller har dock visat sig vara utmanande, både ur ett beräkningsperspektiv och på grund av kravet på en uppsjö av kommenterade data. Dessutom, när de utmanas i nya situationer eller datadistributioner (måldomän), kan dessa modeller prestera otillräckligt. Sådana exempel är övergång från en stad till en annan, olika vädersituationer eller förändringar i solljus. Unsupervised Domain adaptation (UDA) utnyttjar omärkt data (enkel åtkomst) för att anpassa modeller till nya förhållanden eller datadistributioner. Inspirerade av att miljöförändringar sker gradvis, fokuserar vi på Online UDA. Istället för att direkt anpassa en modell till ett krävande tillstånd, gör vi ständigt mindre anpassningar till varje liten förändring i data, vilket skapar en mjuk övergång från den aktuella domänen till måldomänen. För att utföra gradvis anpassning använde vi toppmoderna semantiska segmenteringsmetoder för att öka regnintensiteten (25, 50, 75, 100 och 200 mm regn). Vi visar att modeller för djupinlärning kan anpassa sig betydligt bättre till hårda domäner när man utnyttjar mellanliggande. Dessutom introducerar vi en modellväxlingsmekanism som tillåter justering tillbaka till källdomänen, efter anpassning, utan att tappa prestanda.
|
140 |
Multimodální zpracování dat a mapování v robotice založené na strojovém učení / Machine Learning-Based Multimodal Data Processing and Mapping in RoboticsLigocki, Adam January 2021 (has links)
Disertace se zabývá aplikaci neuronových sítí pro detekci objektů na multimodální data v robotice. Celkem cílí na tři oblasti: tvorbu datasetu, zpracování multimodálních dat a trénování neuronových sítí. Nejdůležitější části práce je návrh metody pro tvorbu rozsáhlých anotovaných datasetů bez časové náročného lidského zásahu. Metoda používá neuronové sítě trénované na RGB obrázcích. Užitím dat z několika snímačů pro vytvoření modelu okolí a mapuje anotace z RGB obrázků na jinou datovou doménu jako jsou termální obrázky, či mračna bodů. Pomoci této metody autor vytvořil dataset několika set tisíc anotovaných obrázků a použil je pro trénink neuronové sítě, která následně překonala modely trénované na menších, lidmi anotovaných datasetech. Dále se autor v práci zabývá robustností detekce objektů v několika datových doménách za různých povětrnostních podmínek. Práce také popisuje kompletní řetězec zpracování multimodálních dat, které autor vytvořil během svého doktorského studia. To Zahrnuje vývoj unikátního senzorického zařízení, které je vybavené řadou snímačů běžně užívaných v robotice. Dále autor popisuje proces tvorby rozsáhlého, veřejně dostupného datasetu Brno Urban Dataset. Na závěr autor popisuje software, který vznikl během jeho studia a jak je tento software užit při zpracování dat v rámci jeho práce (Atlas Fusion a Robotic Template Library).
|
Page generated in 0.0686 seconds