Return to search

LSTM Feature Engineering Through Time Series Similarity Embedding / Aspektkonstruktion för LSTM-nätverk genom inbäddning av tidsserielikheter

Time series prediction has many applications. In cases with simultaneous series (like measurements of weather from multiple stations, or multiple stocks on the stock market)it is not unlikely that these series from different measurement origins behave similarly, or respond to the same contextual signals. Training input to a prediction model could be constructed from all simultaneous measurements to try and capture the relations between the measurement origins. A generalized approach is to train a prediction model on samples from any individual measurement origin. The data mass is the same in both cases, but in the first case, fewer samples of a larger width are used, while the second option uses a higher number of smaller samples. The first, high-width option, risks over-fitting as a result of fewer training samples per input variable. The second, general option, would have no way to learn relations between the measurement origins. Amending the general model with contextual information would allow for keeping a high samples-per-variable ratio without losing the ability to take the origin of the measurements into account. This thesis presents a vector embedding method for measurement origins in an environment with shared response to contextual signals. The embeddings are based on multi-variate time series from the origins. The embedding method is inspired by co-occurrence matrices commonly used in Natural Language Processing. The similarity measures used between the series are Dynamic Time Warping (DTW), Step-wise Euclidean Distance, and Pearson Correlation. The dimensionality of the resulting embeddings is reduced by Principal Component Analysis (PCA) to increase information density, and effectively preserve variance in the similarity space. The created embedding system allows contextualization of samples, akin to the human intuition that comes from knowing where measurements were taken from, like knowing what sort of company a stock ticker represents, or what environment a weather station is located in. In the embedded space, embeddings of series from fundamentally similar measurement origins are closely located, so that information regarding the behavior of one can be generalized to its neighbors. The resulting embeddings from this work resonate well with existing clustering methods in a weather dataset, and partially in a financial dataset, and do provide performance improvement for an LSTM network acting on said financial dataset. The similarity embeddings also outperform an embedding layer trained together with the LSTM.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-188741
Date January 2022
CreatorsBångerius, Sebastian
PublisherLinköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds