Global ETD Search

1	A dynamic approximate representation scheme for streaming time series Zhou, Pu January 2009 (has links) The huge volume of time series data generated in many applications poses new challenges in the techniques of data storage, transmission, and computation. Further more, when the time series are in the form of streaming data, new problems emerge and new techniques are required because of the streaming characteristics, e.g. high volume, high speed and continuous flowing. Approximate representation is one of the most efficient and effective solutions to address the large-volume-high-speed problem. In this thesis, we propose a dynamic representation scheme for streaming time series. Existing methods use a unitary function form for the entire approximation task. In contrast, our method adopts a set of function candidates such as linear function, polynomial function(degree ≥ 2), and exponential function. We provide a novel segmenting strategy to generate subsequences and dynamically choose candidate functions to approximate the subsequences. / Since we are dealing with streaming time series, the segmenting points and the corresponding approximate functions are incrementally produced. For a certain function form, we use a buffer window to find the local farthest possible segmenting point under a user specified error tolerance threshold. To achieve this goal, we define a feasible space for the coefficients of the function and show that we can indirectly find the local best segmenting point by the calculation in the coefficient space. Given the error tolerance threshold, the candidate function representing more information by unit parameter is chosen as the approximate function. Therefore, our representation scheme is more flexible and compact. We provide two dynamic algorithms, PLQS and PLQES, which involve two and three candidate functions, respectively. We also present the general strategy of function selection when more candidate functions are considered. In the experimental test, we examine the effectiveness of our algorithms with synthetic and real time series data sets. We compare our method with the piecewise linear approximation method and the experimental results demonstrate the evident superiority of our dynamic approach under the same error tolerance threshold.
2	An Enhanced Approach using Time Series Segmentation for Fault Detection of Semiconductor Manufacturing Process Tian, Runfeng 28 October 2019 (has links) No description available. Mechanical Engineering Semiconductor fault detection Time series segmentation Feature extraction
3	Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone Accelerometer Na, Shuang 28 June 2017 (has links) Time series analysis has been explored by the researchers in many areas such, as statistical research, engineering applications, medical analysis, and finance study. To represent the data more efficiently, the mining process is supported by time series segmentation. Time series segmentation algorithm looks for the change points between two different patterns and develops a suitable model, depending on the data observed in such segment. Based on the issue of limited computing and storage capability, it is necessary to consider an adaptive and incremental online segmentation method. In this study, we propose an Online Empirical Bayesian Kernel Segmentation (OBKS), which combines Online Multivariate Kernel Density Estimation (OMKDE) and Online Empirical Bayesian Segmentation (OBS) algorithm. This innovative method considers Online Multivariate Kernel density as a predictive distribution derived by Online Empirical Bayesian segmentation instead of using posterior predictive distribution as a predictive distribution. The benefit of Online Multivariate Kernel Density Estimation is that it does not require the assumption of a pre-defined prior function, which makes the OMKDE more adaptive and adjustable than the posterior predictive distribution. Human Activity Recognition (HAR) by smartphones with embedded sensors is a modern time series application applied in many areas, such as therapeutic applications and sensors of cars. The important procedures related to the HAR problem include classification, clustering, feature extraction, dimension reduction, and segmentation. Segmentation as the first step of HAR analysis attempts to represent the time interval more effectively and efficiently. The traditional segmentation method of HAR is to partition the time series into short and fixed length segments. However, these segments might not be long enough to capture the sufficient information for the entire activity time interval. In this research, we segment the observations of a whole activity as a whole interval using the Online Empirical Bayesian Kernel Segmentation algorithm as the first step. The smartphone with built-in accelerometer generates observations of these activities. Based on the segmenting result, we introduce a two-layer random forest classification method. The first layer is used to identify the main group; the second layer is designed to analyze the subgroup from each core group. We evaluate the performance of our method based on six activities: sitting, standing, lying, walking, walking\_upstairs, and walking\_downstairs on 30 volunteers. If we want to create a machine that can detect walking\_upstairs and walking\_downstairs automatically, it requires more information and more detail that can generate more complicated features, since these two activities are very similar. Continuously, considering the real-time Activity Recognition application on the smartphones by the embedded accelerometers, the first layer classifies the activities as static and dynamic activities, the second layer classifies each main group into the sub-classes, depending on the first layer result. For the data collected, we get an overall accuracy of 91.4\% based on the six activities and an overall accuracy of 100\% based only on the dynamic activity (walking, walking\_upstairs, walking\_downstairs) and the static activity (sitting, standing, lying). Online Empirical Bayesian Time series segmentation Kernel Density Random Forest human pattern recognition Statistics and Probability
4	Unsupervised Segmentation of Time Series Data Svensson, Martin January 2021 (has links) In a modern vehicle system the amount of data generated are time series large enough for big data. Many of the time series contains interesting patterns, either densely populated or scarcely distributed over the data. For engineers to review the data a segmentation is crucial for data reduction, which is why this thesis investigates unsupervised segmentation of time series. This report uses two different methods, Fast Low-cost Unipotent Semantic Segmentation (FLUSS) and Information Gain-based Temporal Segmentation (IGTS). These have different approaches, shape and statistical respectively. The goal is to evaluate the strength and weaknesses on tailored time series data, that has properties suiting one or more of the models. The data is constructed from an open dataset, the cricket dataset, that contains labelled segments. These are then concatenated to create datasets with specific properties. Evaluation metrics suitable for segmentation are discussed and evaluated. From the experiments it is clear that all models has strength and weaknesses, so outcome will depend on the data and model combination. The shape based model, FLUSS, cannot handle reoccurring events or regimes. However, linear transitions between regimes, e.g. A to B to C, gives very good results if the regimes are not too similar. Statistical model, IGTS, yields a non-intuitive segmentation for humans, but could be a good way to reduce data in a preprocess step. It does have the ability to automatically reduce the number of segments to the optimal value based on entropy, which depending on the goal can be desirable or not. Overall the methods delivered at worst the same as the random segmentation model, but in every test one or more models has better results than this baseline model. Unsupervised segmentation of time series is a difficult problem and will be highly dependent on the target data. Time series segmentation Unsupervised segmentation Other Computer and Information Science Annan data- och informationsvetenskap
5	An empirical study of the impact of data dimensionality on the performance of change point detection algorithms / En empirisk studie av data dimensionalitetens påverkan på change point detection algoritmers prestanda Noharet, Léo January 2023 (has links) When a system is monitored over time, changes can be discovered in the time series of monitored variables. Change Point Detection (CPD) aims at finding the time point where a change occurs in the monitored system. While CPD methods date back to the 1950’s with applications in quality control, few studies have been conducted on the impact of data dimensionality on CPD algorithms. This thesis intends to address this gap by examining five different algorithms using synthetic data that incorporates changes in mean, covariance, and frequency across dimensionalities up to 100. Additionally, the algorithms are evaluated on a collection of data sets originating from various domains. The studied methods are then assessed and ranked based on their performance on both synthetic and real data sets, to aid future users in selecting an appropriate CPD method. Finally, stock data from the 30 most traded companies on the Swedish stock market are collected to create a new CPD data set to which the CPD algorithms are applied. The changes of the monitored system that the CPD algorithms aim to detect are the changes in policy rate set by the Swedish central bank, Riksbank. The results of the thesis show that the dimensionality impacts the accuracy of the methods when noise is present and when the degree of mean or covariance change is small. Additionally, the application of the algorithms on real world data sets reveals large differences in performance between the studied methods, underlining the importance of comparison studies. Ultimately, the kernel based CPD method performed the best across the real world data set employed in the thesis. / När system övervakas över tid kan förändringar upptäckas i de uppmätade variablers tidsseriedata. Change Point Detection (CPD) syftar till att hitta tidpunkten då en förändring inträffar i det övervakade systemet’s tidseriedata. Medan CPD-metoder har sitt urspring i kvalitetskontroll under 1950-talet, har få studier undersökt datans dimensionalitets påverkan på CPD-algoritmer’s förmåga. Denna avhandling avser att fylla denna kunskapslucka genom att undersöka fem olika algoritmer med hjälp av syntetiska data som inkorporerar förändringar i medelvärde, kovarians och frekvens över dimensioner upp till 100. Dessutom jämförs algoritmerna med hjälp av en samling av data från olika domäner. De studerade metoderna bedöms och rangordnas sedan baserat på deras prestanda på både syntetiska och verkliga datauppsättningar för att hjälpa framtida användare att välja en lämplig CPD algoritm. Slutligen har aktiedata samlats från de 30 mest handlade företagen på den svenska aktiemarknaden för att skapa ett nytt data set. De förändringar i det övervakade systemet som CPD-algoritmerna syftar till att upptäcka är förändringarna i styrräntan som fastställs av Riksbanken. Resultaten av studien tyder på att dimensionaliteten påverkar förmågan hos algoritmerna att upptäcka förändringspunkterna när brus förekommer i datan och när graden av förändringen är liten. Dessutom avslöjar tillämpningen av algoritmerna på den verkliga datan stora skillnader i prestanda mellan de studerade metoderna, vilket understryker vikten av jämförelsestudier för att avslöja dessa skillnader. Slutligen presterade den kernel baserade CPD metoden bäst. Time series segmentation Change point detection Multivariate time series Data dimensionality Tidsserie-segmentering Förändringspunkts detektering Mulitvariabla tidsserier Data dimentionalitet Computer and Information Sciences Data- och informationsvetenskap
6	Identification of Fundamental Driving Scenarios Using Unsupervised Machine Learning / Identifiering av grundläggande körscenarier med icke-guidad maskininlärning Anantha Padmanaban, Deepika January 2020 (has links) A challenge to release autonomous vehicles to public roads is safety verification of the developed features. Safety test driving of vehicles is not practically feasible as the acceptance criterion is driving at least 2.1 billion kilometers [1]. An alternative to this distance-based testing is the scenario-based approach, where the intelligent vehicles are exposed to known scenarios. Identification of such scenarios from the driving data is crucial for this validation. The aim of this thesis is to investigate the possibility of unsupervised identification of driving scenarios from the driving data. The task is performed in two major parts. The first is the segmentation of the time series driving data by detecting changepoints, followed by the clustering of the previously obtained segments. Time-series segmentation is approached using a Deep Learning method, while the second task is performed using time series clustering. The work also includes a visual approach for validating the time-series segmentation, followed by a quantitative measure of the performance. The approach is also qualitatively compared against a Bayesian Nonparametric approach to identify the usefulness of the proposed method. Based on the analysis of results, there is a discussion about the usefulness and drawbacks of the method, followed by the scope for future research. / En utmaning att släppa autonoma fordon på allmänna vägar är säkerhetsverifiering av de utvecklade funktionerna. Säkerhetstestning av fordon är inte praktiskt genomförbart eftersom acceptanskriteriet kör minst 2,1 miljarder kilometer [1]. Ett alternativ till denna distansbaserade testning är det scenaribaserade tillväga-gångssättet, där intelligenta fordon utsätts för kända scenarier. Identifiering av sådana scenarier från kördata är avgörande för denna validering. Syftet med denna avhandling är att undersöka möjligheten till oövervakad identifiering av körscenarier från kördata. Uppgiften utförs i två huvuddelar. Den första är segmenteringen av tidsseriedrivdata genom att detektera ändringspunkter, följt av klustring av de tidigare erhållna segmenten. Tidsseriesegmentering närmar sig med en Deep Learningmetod, medan den andra uppgiften utförs med hjälp av tidsseriekluster. Arbetet innehåller också ett visuellt tillvägagångssätt för att validera tidsserierna, följt av ett kvantitativt mått på prestanda. Tillvägagångssättet jämförs också med en Bayesian icke-parametrisk metod för att identifiera användbarheten av den föreslagna metoden. Baserat på analysen av resultaten diskuteras metodens användbarhet och nackdelar, följt av möjligheten för framtida forskning. Time-series Segmentation Time-series Clustering Stacked Sparse Autoencoders Unsupervised Learning Autonomous Driving Feature Extraction Segment av tidsserier Tidsserie-kluster Staplade autokodare Oövervakat lärande Autonom körning Särdragsextraktion Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.1172 seconds