Spelling suggestions: "subject:"constrained""
1 |
Visual Saliency Analysis on Fashion Images Using Image Processing and Deep Learning ApproachesNeupane, Aashish 01 December 2020 (has links)
ABSTRACTAASHISH NEUPANE, for the Master of Science degree in BIOMEDICAL ENGINEERING, presented on July 35, 2020, at Southern Illinois University Carbondale. TITLE: VISUAL SALIENCY ANALYSIS ON FASHION IMAGES USING IMAGE PROCESSING AND DEEP LEARNING APPROACHES.MAJOR PROFESSOR: Dr. Jun QinState-of-art computer vision technologies have been applied in fashion in multiple ways, and saliency modeling is one of those applications. In computer vision, a saliency map is a 2D topological map which indicates the probabilistic distribution of visual attention priorities. This study is focusing on analysis of the visual saliency on fashion images using multiple saliency models, evaluated by several evaluation metrics. A human subject study has been conducted to collect people’s visual attention on 75 fashion images. Binary ground-truth fixation maps for these images have been created based on the experimentally collected visual attention data using Gaussian blurring function. Saliency maps for these 75 fashion images were generated using multiple conventional saliency models as well as deep feature-based state-of-art models. DeepFeat has been studied extensively, with 44 sets of saliency maps, exploiting the features extracted from GoogLeNet and ResNet50. Seven other saliency models have also been utilized to predict saliency maps on these images. The results were compared over 5 evaluation metrics – AUC, CC, KL Divergence, NSS and SIM. The performance of all 8 saliency models on prediction of visual attention on fashion images over all five metrics were comparable to the benchmarked scores. Furthermore, the models perform well consistently over multiple evaluation metrics, thus indicating that saliency models could in fact be applied to effectively predict salient regions in random fashion advertisement images.
|
2 |
Convnet features for age estimationBukar, Ali M., Ugail, Hassan 07 1900 (has links)
No / Research in facial age estimation has been active for over a decade. This is due to its numerous applications. Recently, convolutional neural networks (CNNs) have been used in an attempt to solve this age old problem. For this purpose, researchers have proposed various CNN architectures. Unfortunately, most of the proposed techniques have been based on relatively ‘shallow’ networks. In this work, we leverage the capability of an off-the-shelf deep CNN model, namely the VGG-Face model, which has been trained on millions of face images. Interestingly, despite being a simple approach, features extracted from the VGG-Face model, when reduced and fed into linear regressors, outperform most of the state-of-the-art CNNs. e.g. on both FGNET-AD and Morph II benchmark databases. Furthermore, contrary to using the last fully connected (FC) layer of the trained model, we evaluate the activations from different layers of the architecture. In fact, our experiments show that generic features learnt from intermediate layer activations carry more ageing information than the FC layers.
|
3 |
Generalization and Fairness Optimization in Pretrained Language ModelsGhanbar Zadeh, Somayeh 05 1900 (has links)
This study introduces an effective method to address the generalization challenge in pretrained language models (PLMs), which affects their performance on diverse linguistic data beyond their training scope. Improving PLMs' adaptability to out-of-distribution (OOD) data is essential for their reliability and practical utility in real-world applications. Furthermore, we address the ethical imperative of fairness in PLMs, particularly as they become integral to decision-making in sensitive societal sectors. We introduce gender-tuning, to identify and disrupt gender-related biases in training data. This method perturbs gendered terms, replacing them to break associations with other words. Gender-tuning stands as a practical, ethical intervention against gender bias in PLMs. Finally, we present FairAgent, a novel framework designed to imbue small language models (SLMs) with fairness, drawing on the knowledge of large language models (LLMs) without incurring the latter's computational costs. FairAgent operates by enabling SLMs to consult with LLMs, harnessing their vast knowledge to guide the generation of less biased content. This dynamic system not only detects bias in SLM responses but also generates prompts to correct it, accumulating effective prompts for future use. Over time, SLMs become increasingly adept at producing fair responses, enhancing both computational efficiency and fairness in AI-driven interactions.
|
4 |
End-to-end dialogové systémy s předtrénovanými jazykovými modely / End-to-end dialogue systems with pretrained language modelsKulhánek, Jonáš January 2021 (has links)
Current dialogue systems typically consist of separate components, which are manu- ally engineered to a large part and need extensive annotation. End-to-end trainable sys- tems exist but produce lower-quality, unreliable outputs. The recent transformer-based pre-trained language models such as GPT-2 brought considerable progress to language modelling, but they rely on huge amounts of textual data, which are not available for common dialogue domains. Therefore, training these models runs a high risk of overfit- ting. To overcome these obstacles, we propose a novel end-to-end dialogue system called AuGPT. We add auxiliary training objectives to use training data more efficiently, and we use massive data augmentation via back-translation and pretraining on multiple datasets to increase data volume and diversity. We evaluate our system using automatic methods (corpus-based metrics, user simulation), human evaluation as part of the DSTC 9 shared task challenge (where our system placed 3rd out of 10), as well as extensive manual error analysis. Our method substantially outperforms the baseline on the MultiWOZ bench- mark and shows competitive results with state-of-the-art end-to-end dialogue systems. 1
|
5 |
Improving the Accessibility of Arabic Electronic Theses and Dissertations (ETDs) with Metadata and ClassificationAbdelrahman, Eman January 2021 (has links)
Much research work has been done to extract data from scientific papers, journals, and articles. However, Electronic Theses and Dissertations (ETDs) remain an unexplored genre of data in the research fields of natural language processing and machine learning. Moreover, much of the related research involved data that is in the English language. Arabic data such as news and tweets have begun to receive some attention in the past decade. However, Arabic ETDs remain an untapped source of data despite the vast number of benefits to students and future generations of scholars. Some ways of improving the browsability and accessibility of data include data annotation, indexing, parsing, translation, and classification. Classification is essential for the searchability and management of data, which can be manual or automated. The latter is beneficial when handling growing volumes of data. There are two main roadblocks to performing automatic subject classification on Arabic ETDs. The first is the unavailability of a public corpus of Arabic ETDs. The second is the Arabic language’s linguistic complexity, especially in academic documents. This research presents the Otrouha project, which aims at building a corpus of key metadata of Arabic ETDs as well as providing a methodology for their automatic subject classification. The first goal is aided by collecting data from the AskZad Digital Library. The second goal is achieved by exploring different machine learning and deep learning techniques. The experiments’ results show that deep learning using pretrained language models gave the highest classification performance, indicating that language models significantly contribute to natural language understanding. / M.S. / An Electronic Thesis or Dissertation (ETD) is an openly-accessible electronic version of a graduate student’s research thesis or dissertation. It documents their main research effort that has taken place and becomes available in the University Library instead of a paper copy. Over time, collections of ETDs have been gathered and made available online through different digital libraries. ETDs are a valuable source of information for scholars and researchers, as well as librarians. With the digitalization move in most Middle Eastern Universities, the need to make Arabic ETDs more accessible significantly increases as their numbers increase. One of the ways to improve their accessibility and searchability is through providing automatic classification instead of manual classification. This thesis project focuses on building a corpus of metadata of Arabic ETDs and building a framework for their automatic subject classification. This is expected to pave the way for more exploratory research on this valuable genre of data.
|
6 |
Utilizing GPT for Interactive Dialogue-based Learning Scenarios : A Comparative Analysis with Rasa / Användande av GPT för interaktivt dialogbaserat lärande : En jämförelseanalys med RasaBjörnsson, Valdimar January 2023 (has links)
This thesis explores the use of advanced language models, specifically OpenAI’s Generative Pretrained Transformer (GPT), in the context of interactive tutoring systems built within a Unity-based game environment. The central problem addressed is whether the recent advancements in large language models make them feasible and useful to function as tutors specifically in providing meaningful, engaging, and educationally rich user interactions on a dialogue based learning platform developed by Fictive Reality. There is also a comparison on the effectiveness of GPT versus the model that previously powered the learning platform built in Rasa. The importance of this problem lies in offering people learning opportunities that might not otherwise be available to them, and in seeing if recent advancements in generative AI are sufficient for developing useful interactive AI tutors of soft skills. The Fictive Reality learning platform is powered by a Rasa model that generates appropriate responses to users in the context of roleplay-based learning scenarios while keeping an internal state of the progress of the dialogue. The project entails replacing this model with GPT and a comparison of their performance and respective merits. We also explored the potential for a hybrid model, leveraging the strengths of both systems. Using Rasa for internal state tracking and answering simpler queries, and utilizing GPT to handle those queries whose intent Rasa cannot determine. The first part of this project was integrating GPT with the existing functionality of the platform, this includes changes to the platform that allow people to create and play GPT powered learning scenarios and adopting the existing features and user interface. Additionally, prompt engineering GPT to act as a tutor and to stay within the context of a learning environment. Changes had to be made to the platform so that the already existing features of Rasa scenarios could be replicated in GPT scenarios. Finally there is a systematic comparison of the user experience and performance metrics when interacting with either a GPT or a Rasa chatbot in a learning scenario. Specifically these metrics are determined from the conversational flow between bot and user, the context and continuity, finish rate, chit-chat handling and length of average session. The results suggest a distinct user preference for the GPT model due to its superior conversational capabilities, despite Rasa’s faster response times and state-tracking feature. The study suggest that GPT is sufficient for creating useful learning scenarios in restricted contexts. Therefore we suggest that large language models can be leveraged in interactive learning systems, with potential impacts on edtech, AI in education, and conversational AI. / Detta examensarbete utforskar användningen av avancerade språkmodeller, särskilt OpenAI’s Generative Pretrained Transformer (GPT), tillsammans med interaktiva handledningssystem byggda i en Unity-baserad spelmiljö. Det centrala problemet som tas upp är om det är genomförbart och användbart att använda GPT som handledare. Vidare genomfördes också en jämförelse av effektiviteten hos GPT jämfört med en mer traditionell modell, Rasa, när det gäller att tillhandahålla meningsfulla, engagerande och lärorika interaktioner. Detta problem har betydelse för att erbjuda människor lärandemöjligheter som annars kanske inte skulle vara tillgängliga för dem och för att se om de senaste framstegen inom generativ AI är tillräckliga för användbar interaktiv handledning av mjuka färdigheter, så kallade soft skills". Lärplattformen Fictive Reality drivs av en Rasa-modell som genererar lämpliga svar till användare i samband med vissa inlärningsscenarier samtidigt som man behåller ett internt tillstånd av dialogens framsteg. Projektet syftar till att ersätta denna modell med GPT och göra en jämförelse av prestandan och hos respektive modell. Vi undersökte också potentialen för en hybridmodell som utnyttjar båda systemens styrkor genom att använda Rasa för intern tillståndsspårning och svara på enklare frågor, och använda GPT för att hantera de frågor vars avsikt Rasa inte kan avgöra. Den första delen av projektet var att integrera GPT med plattformens befintliga funktionalitet, detta inkluderar förändringar av plattformen som gör det möjligt för människor att skapa och spela GPT-drivna inlärningsscenarier med det befintliga användargränssnittet och funktioner för Rasa-drivna scenarier. Förändringar var tvungna att göras på plattformen så att de redan befintliga funktionerna i Rasa-scenarier kunde replikeras i GPT-scenarier. Slutligen gjordes en systematisk jämförelse av prestandan och användarupplevelsen när man interagerar med antingen en GPT- eller en Rasa-chatbot i ett inlärningsscenario. Resultaten tyder på en distinkt användarpreferens för GPT-modellen på grund av dess överlägsna konversationsförmåga, trots Rasa:s snabbare svarstider och tillståndsspårningsfunktion. Studien tyder på att GPT är tillräckligt för att skapa användbara lärande scenarier i begränsade sammanhang. Denna studie tyder på att stora språkmodeller kan utnyttjas i interaktiva inlärningssystem, med potentiella effekter på edtech, AI inom utbildning och konversations-AI-områden.
|
7 |
Multimodal Multi-label Classification with Small Foundation ModelsMartin Björkdahl, Liv January 2024 (has links)
The use of electronic health records (EHR) from various sources like text, images and time-series data to make predictions or diagnosis have been researchedpreviously. Many previous methods have used separate models either for sepa-rate modalities or for distinct tasks. Recently, models trained to make medicalpredictions using multimodal input have emerged, as a unified approach wouldbe beneficial for health practitioners. We present a single model to make medicalpredictions for several tasks, using diverse input from different modalities. Wedemonstrate the effectiveness of using an autoencoder method to project (EHR)data from three different modalities – images, text and time-series data – into thesmall language model Gemma-2B. 6 projector models are used together with the small language model to perform multi-label prediction for 12 different medicalprediction tasks. Results show that a jointly trained model using asymmetric loss,a loss function that dynamically emphasises positives that are poorly predicted,shows good performance and predicts evenly across tasks.
|
8 |
Skin lesion detection using deep learningRajit Chandra (12495442) 03 May 2022 (has links)
<p>Skin lesion can be deadliest if not detected early. Early detection of skin lesion can save many lives. Artificial Intelligence and Machine learning is helping healthcare in many ways and so in the diagnosis of skin lesion. Computer aided diagnosis help clinicians in detecting the cancer. The study was conducted to classify the seven classes of skin lesion using very powerful convolutional neural networks. The two pre trained models i.e., DenseNet and Incepton-v3 were employed to train the model and accuracy, precision, recall, f1score and ROC-AUC was calculated for every class prediction. Moreover, gradient class activation maps were also used to aid the clinicians in determining what are the regions of image that influence model to make a certain decision. These visualizations are used for explainability of the model. Experiments showed that DenseNet performed better then Inception V3. Also it was noted that gradient class activation maps highlighted different regions for predicting same class. The main contribution was to introduce medical aided visualizations in lesion classification model that will help clinicians in understanding the decisions of the model. It will enhance the reliability of the model. Also, different optimizers were employed with both models to compare the accuracies.</p>
|
Page generated in 0.0486 seconds