This thesis explores the implementation of Optical Character Recognition (OCR) – based text extraction and random forest regression analysis for housing market valuation, specifically focusing on the impact of value factors, derived from OCR-extracted economic values from housing cooperatives’ annual reports. The objective is to perform price estimations using the Random Forest model to identify the key value factors that influence the estimation process and examine how the economic values from annual reports affect the sales price. The thesis aims to highlight the often-overlooked aspect that when purchasing an apartment, one also assumes the liabilities of the housing cooperative. The motivation for utilizing OCR techniques stems from the difficulties associated with manual data collection, as there is a lack of readily accessible structured data on the subject, emphasizing the importance of automation for effective data extraction. The findings indicate that OCR can effectively extract data from annual reports, but with limitations due to variation in report structures. The regression analysis reveals the Random Forest model’s effectiveness in estimating prices, with location and construction year emerging as the most influential factors. Furthermore, incorporating the economic values from the annual reports enhances the accuracy of price estimation compared to the model that excluded such factors. However, definitive conclusions regarding the precise impact of these economic factors could not be drawn due to limited geographical spread of data points and potential hidden value factors. The study concludes that the machine learning model can be used to make a credible price estimate on cooperative apartments and that OCR methods prove valuable in automating data extraction from annual reports, although standardising report format would enhance their efficiency. The thesis highlights the significance of considering the housing cooperatives’ economic values when making property purchases.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-506826 |
Date | January 2023 |
Creators | Lövgren, Sofia, Löthman, Marcus |
Publisher | Uppsala universitet, Avdelningen för systemteknik |
Source Sets | DiVA Archive at Upsalla University |
Language | Swedish |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC STS, 1650-8319 ; 23016 |
Page generated in 0.0024 seconds