Global ETD Search

Return to search

Multimodal Multi-label Classification with Small Foundation Models

The use of electronic health records (EHR) from various sources like text, images and time-series data to make predictions or diagnosis have been researchedpreviously. Many previous methods have used separate models either for sepa-rate modalities or for distinct tasks. Recently, models trained to make medicalpredictions using multimodal input have emerged, as a unified approach wouldbe beneficial for health practitioners. We present a single model to make medicalpredictions for several tasks, using diverse input from different modalities. Wedemonstrate the effectiveness of using an autoencoder method to project (EHR)data from three different modalities – images, text and time-series data – into thesmall language model Gemma-2B. 6 projector models are used together with the small language model to perform multi-label prediction for 12 different medicalprediction tasks. Results show that a jointly trained model using asymmetric loss,a loss function that dynamically emphasises positives that are poorly predicted,shows good performance and predicts evenly across tasks.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-531045

language technology

LLM

multi-label classification

generative pretrained models

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-531045
Date	January 2024
Creators	Martin Björkdahl, Liv
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Multimodal Multi-label Classification with Small Foundation Models

Description

Links & Downloads

Tags

Additional Fields