Spelling suggestions: "subject:"[een] MASKED AUTOENCODER"" "subject:"[enn] MASKED AUTOENCODER""
1 |
From Pixels to Prices with ViTMAE : Integrating Real Estate Images through Masked Autoencoder Vision Transformers (ViTMAE) with Conventional Real Estate Data for Enhanced Automated Valuation / Från pixlar till priser med ViTMAE : Integrering av bostadsbilder genom Masked Autoencoder Vision Transformers (ViTMAE) med konventionell fastighetsdata för förbättrad automatiserad värderingEkblad Voltaire, Fanny January 2024 (has links)
The integration of Vision Transformers (ViTs) using Masked Autoencoder pre-training (ViTMAE) into real estate valuation is investigated in this Master’s thesis, addressing the challenge of effectively analyzing visual information from real estate images. This integration aims to enhance the accuracy and efficiency of valuation, a task traditionally dependent on realtor expertise. The research involved developing a model that combines ViTMAE-extracted visual features from real estate images with traditional property data. Focusing on residential properties in Sweden, the study utilized a dataset of images and metadata from online real estate listings. An adapted ViTMAE model, accessed via the Hugging Face library, was trained on the dataset for feature extraction, which was then integrated with metadata to create a comprehensive multimodal valuation model. Results indicate that including ViTMAE-extracted image features improves prediction accuracy in real estate valuation models. The multimodal approach, merging visual and traditional metadata, improved accuracy over metadata-only models. This thesis contributes to real estate valuation by showcasing the potential of advanced image processing techniques in enhancing valuation models. It lays the groundwork for future research in more refined holistic valuation models, incorporating a wider range of factors beyond visual data. / Detta examensarbete undersöker integrationen av Vision Transformers (ViTs) med Masked Autoencoder pre-training (ViTMAE) i bostadsvärdering, genom att addressera utmaningen att effektivt analysera visuell information från bostadsannonser. Denna integration syftar till att förbättra noggrannheten och effektiviteten i fastighetsvärdering, en uppgift som traditionellt är beroende av en fysisk besiktning av mäklare. Arbetet innefattade utvecklingen av en modell som kombinerar bildinformation extraherad med ViTMAE från fastighetsbilder med traditionella fastighetsdata. Med fokus på bostadsfastigheter i Sverige använde studien en databas med bilder och metadata från bostadsannonser. Den anpassade ViTMAE-modellen, tillgänglig via Hugging Face-biblioteket, tränades på denna databas för extraktion av bildinformation, som sedan integrerades med metadata för att skapa en omfattande värderingsmodell. Resultaten indikerar att inklusion av ViTMAE-extraherad bildinformation förbättrar noggranheten av bostadssvärderingsmodeller. Den multimodala metoden, som kombinerar visuell och traditionell metadata, visade en förbättring i noggrannhet jämfört med modeller som endast använder metadata. Denna uppsats bidrar till bostadsvärdering genom att visa på potentialen hos avancerade bildanalys för att förbättra värderingsmodeller. Den lägger grunden för framtida forskning i mer raffinerade holistiska värderingsmodeller som inkluderar ett bredare spektrum av faktorer utöver visuell data.
|
2 |
[pt] AJUSTE FINO DE MODELO AUTO-SUPERVISIONADO USANDO REDES NEURAIS SIAMESAS PARA CLASSIFICAÇÃO DE IMAGENS DE COVID-19 / [en] FINE-TUNING SELF-SUPERVISED MODEL WITH SIAMESE NEURAL NETWORKS FOR COVID-19 IMAGE CLASSIFICATIONANTONIO MOREIRA PINTO 03 December 2024 (has links)
[pt] Nos últimos anos, o aprendizado auto-supervisionado demonstrou desempenho estado da arte em áreas como visão computacional e processamento de
linguagem natural. No entanto, ajustar esses modelos para tarefas específicas
de classificação, especialmente com dados rotulados, permanece sendo um desafio. Esta dissertação apresenta uma abordagem para ajuste fino de modelos
auto-supervisionados usando Redes Neurais Siamesas, aproveitando a função
de perda semi-hard triplet loss. Nosso método visa refinar as representações
do espaço latente dos modelos auto-supervisionados para melhorar seu desempenho em tarefas posteriores de classificação. O framework proposto emprega
Masked Autoencoders para pré-treinamento em um conjunto abrangente de
dados de radiografias, seguido de ajuste fino com redes siamesas para separação eficaz de características e melhor classificação. A abordagem é avaliada
no conjunto de dados COVIDx 9 para detecção de COVID-19 a partir de radiografias frontais de peito, alcançando uma nova precisão recorde de 98,5 por cento,
superando as técnicas tradicionais de ajuste fino e o modelo COVID-Net CRX
3. Os resultados demonstram a eficácia de nosso método em aumentar a utilidade de modelos auto-supervisionados para tarefas complexas de imagem
médica. Trabalhos futuros explorarão a escalabilidade dessa abordagem para
outros domínios e a integração de funções de perda de espaço de embedding
mais sofisticadas. / [en] In recent years, self-supervised learning has demonstrated state-of-theart performance in domains such as computer vision and natural language processing. However, fine-tuning these models for specific classification tasks,
particularly with labeled data, remains challenging. This thesis introduces a
novel approach to fine-tuning self-supervised models using Siamese Neural
Networks, specifically leveraging a semi-hard triplet loss function. Our method
aims to refine the latent space representations of self-supervised models to
improve their performance on downstream classification tasks. The proposed
framework employs Masked Autoencoders for pre-training on a comprehensive
radiograph dataset, followed by fine-tuning with Siamese networks for effective
feature separation and improved classification. The approach is evaluated on
the COVIDx dataset for COVID-19 detection from frontal chest radiographs,
achieving a new record accuracy of 98.5 percent, surpassing traditional fine-tuning
techniques and COVID-Net CRX 3. The results demonstrate the effectiveness
of our method in enhancing the utility of self-supervised models for complex
medical imaging tasks. Future work will explore the scalability of this approach
to other domains and the integration of more sophisticated embedding-space
loss functions.
|
Page generated in 0.046 seconds