Global ETD Search

Return to search

Artificial Transactional Data Generation for Benchmarking Algorithms / Generering av artificiell transaktionsdata för att prestandamäta algoritmer

Modern retailers have been collecting more and more data over the past decades. The increased sizes of collected data have led to higher demand for data analytics expertise tools, which the Umeå-founded company Infobaleen provides. A recurring challenge when developing such tools is the data itself. Difficulties in finding relevant open data sets have led to a rise in the popularity of using synthetic data. By using artificially generated data, developers gain more control over the input when testing and presenting their work. However, most methods that exist today either depend on real-world data as input or produce results that look synthetic and are difficult to extend. In this thesis, I introduce a method specifically designed to generate synthetic transactional data stochastically. I first examined real-world data provided by Infobaleen to determine suitable statistical distributions to use in my algorithm empirically. I then modelled individual decision-making using points in an embedding space, where the distance between the points serves as a basis for individually unique probability weights. This solution creates data distributed similarly to real-world data and enables retroactive data enrichment using the same embeddings. The result is a data set that looks genuine to the human eye but is entirely synthetic. Infobaleen already generates data with this model when presenting its product to new potential customers or partners.

http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-204096

Data Analytics

Synthetic Data

Statistical Distribution

Embedding

Transactional Data

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-204096
Date	January 2023
Creators	Lundgren, Veronica
Publisher	Umeå universitet, Institutionen för fysik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds

Artificial Transactional Data Generation for Benchmarking Algorithms / Generering av artificiell transaktionsdata för att prestandamäta algoritmer

Description

Links & Downloads

Tags

Additional Fields