In a modern vehicle system the amount of data generated are time series large enough for big data. Many of the time series contains interesting patterns, either densely populated or scarcely distributed over the data. For engineers to review the data a segmentation is crucial for data reduction, which is why this thesis investigates unsupervised segmentation of time series. This report uses two different methods, Fast Low-cost Unipotent Semantic Segmentation (FLUSS) and Information Gain-based Temporal Segmentation (IGTS). These have different approaches, shape and statistical respectively. The goal is to evaluate the strength and weaknesses on tailored time series data, that has properties suiting one or more of the models. The data is constructed from an open dataset, the cricket dataset, that contains labelled segments. These are then concatenated to create datasets with specific properties. Evaluation metrics suitable for segmentation are discussed and evaluated. From the experiments it is clear that all models has strength and weaknesses, so outcome will depend on the data and model combination. The shape based model, FLUSS, cannot handle reoccurring events or regimes. However, linear transitions between regimes, e.g. A to B to C, gives very good results if the regimes are not too similar. Statistical model, IGTS, yields a non-intuitive segmentation for humans, but could be a good way to reduce data in a preprocess step. It does have the ability to automatically reduce the number of segments to the optimal value based on entropy, which depending on the goal can be desirable or not. Overall the methods delivered at worst the same as the random segmentation model, but in every test one or more models has better results than this baseline model. Unsupervised segmentation of time series is a difficult problem and will be highly dependent on the target data.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-176519 |
Date | January 2021 |
Creators | Svensson, Martin |
Publisher | Linköpings universitet, Statistik och maskininlärning |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0154 seconds