Return to search

Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics Sweden

The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-224441
Date January 2023
CreatorsUnnebäck, Tea
PublisherStockholms universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTea Unnebäck

Page generated in 0.0019 seconds