Spelling suggestions: "subject:"time deries data"" "subject:"time deries mata""
41 |
Using Synthetic Data to ModelMobile User Interface InteractionsJalal, Laoa January 2023 (has links)
Usability testing within User Interface (UI) is a central part of assuring high-quality UIdesign that provides good user-experiences across multiple user-groups. The processof usability testing often times requires extensive collection of user feedback, preferablyacross multiple user groups, to ensure an unbiased observation of the potential designflaws within the UI design. Attaining feedback from certain user groups has shown tobe challenging, due to factors such as medical conditions that limits the possibilities ofusers to participate in the usability test. An absence of these hard-to-access groups canlead to designs that fails to consider their unique needs and preferences, which maypotentially result in a worse user experience for these individuals. In this thesis, wetry to address the current gaps within data collection of usability tests by investigatingwhether the Generative Adversarial Network (GAN) framework can be used to generatehigh-quality synthetic user interactions of a particular UI gesture across multiple usergroups. Moreover, a collection UI interaction of 2 user groups, namely the elderlyand young population, was conducted where the UI interaction at focus was thedrag-and-drop operation. The datasets, comprising of both user groups were trainedon separate GANs, both using the doppelGANger architecture, and the generatedsynthetic data were evaluated based on its diversity, how well temporal correlations arepreserved and its performance compared to the real data when used in a classificationtask. The experiment result shows that both GANs produces high-quality syntheticresemblances of the drag-and-drop operation, where the synthetic samples show bothdiversity and uniqueness when compared to the actual dataset. The synthetic datasetacross both user groups also provides similar statistical properties within the originaldataset, such as the per-sample length distribution and the temporal correlationswithin the sequences. Furthermore, the synthetic dataset shows, on average, similarperformance achievements across precision, recall and F1 scores compared to theactual dataset when used to train a classifier to distinguish between the elderly andyounger population drag-and-drop sequences. Further research regarding the use ofmultiple UI gestures, using a single GAN to generate UI interactions across multipleuser groups, and performing a comparative study of different GAN architectures wouldprovide valuable insights of unexplored potentials and possible limitations within thisparticular problem domain.
|
42 |
Jämförelse av datakomprimeringsalgoritmer för sensordata i motorstyrenheter / Comparison of data compression algorithms for sensordata in engine control unitsMöller, Malin, Persson, Dominique January 2023 (has links)
Begränsad processor- och minneskapacitet är en stor utmaning för loggning avsensorsignaler i motorstyrenheter. För att kunna lagra större mängder data i dessakan komprimering användas. För att kunna implementera komprimering imotorstyrenheter krävs det att algoritmerna klarar de begränsningar som finnsgällande processorkapaciteten och ändå kan producera en godtagbarkomprimeringsgrad.Denna avhandling jämför komprimeringsalgoritmer och undersöker vilken ellervilka algoritmer som är bäst lämpade för detta ändamål. Detta i syfte att förbättraloggning och därmed effektivisera felsökning. Detta gjordes genom att utveckla ettsystem som kör olika komprimeringsalgoritmer på samplad sensordata frånmotorstyrenheter och beräknar komprimeringstid och komprimeringsgrad.Resultaten visade att delta-på-delta-komprimering presterade bättre än xorkomprimering för dessa data. Delta-på-delta presterade betydligt bättre gällandekomprimeringsgrad medan skillnaderna i komprimeringstid mellan algoritmernavar marginella. Delta-på-delta-komprimering bedöms ha god potential förimplementering i loggningssystem för motorstyrenheter. Algoritmen bedöms somväl lämpad för loggning av mindre tidsserier vid viktiga händelser, för merkontinuerlig loggning föreslås fortsatta studier för att undersöka hurkomprimeringsgraden kan förbättras ytterligare. / Limited processor and memory capacity is a major challenge for logging sensorsignals in engine control units. In order to be able to store larger amounts of data,compression can be used. To successfully implement compression algorithms inmotor control units, it is essential that the algorithms can effectively handle thelimitations associated with processor capacity while achieving an acceptable level ofcompression.This thesis compares compression algorithms on sensor data from motor controlunits in order to investigate which algorithm(s) are best suited to implement forthis application. The work aims to improve the possibilities of logging sensor dataand thus make the troubleshooting of the engine control units more efficient. Thiswas done by developing a system that performs compression on sampled sensorsignals and calculates the compression time and ratio.The results indicated that delta-of-delta compression performed better than xorcompression for the tested data sets. Delta-of-delta had a significantly bettercompression ratio while the differences between the algorithms regardingcompression time were minor. Delta-of-delta compression was judged to have goodpotential for implementation in engine control unit logging systems. The algorithmis deemed to be well suited for logging smaller time series during important events.For continuous logging of larger time series, further research is suggested in orderto investigate the possibility of improving the compression ratio further.
|
43 |
Ontology-based discovery of time-series data sources for landslide early warning systemPhengsuwan, J., Shah, T., James, P., Thakker, Dhaval, Barr, S., Ranjan, R. 15 July 2019 (has links)
Yes / Modern early warning system (EWS) requires sophisticated knowledge of the natural hazards, the urban context and underlying risk factors to enable dynamic and timely decision making (e.g., hazard detection, hazard preparedness). Landslides are a common form of natural hazard with a global impact and closely linked to a variety of other hazards. EWS for landslides prediction and detection relies on scientific methods and models which requires input from the time series data, such as the earth observation (EO) and urban environment data. Such data sets are produced by a variety of remote sensing satellites and Internet of things sensors which are deployed in the landslide prone areas. To this end, the automatic discovery of potential time series data sources has become a challenge due to the complexity and high variety of data sources. To solve this hard research problem, in this paper, we propose a novel ontology, namely Landslip Ontology, to provide the knowledge base that establishes relationship between landslide hazard and EO and urban data sources. The purpose of Landslip Ontology is to facilitate time series data source discovery for the verification and prediction of landslide hazards. The ontology is evaluated based on scenarios and competency questions to verify the coverage and consistency. Moreover, the ontology can also be used to realize the implementation of data sources discovery system which is an essential component in EWS that needs to manage (store, search, process) rich information from heterogeneous data sources.
|
44 |
Creation of a Time-Series Data Cleaning ToolboxKovács, Márton January 2024 (has links)
A significant drawback of currently used data cleaning methods includes a reliance on domain knowledge or a background in data science, and with the vast number of possible solutions to this problem, the step of data cleaning may be entirely foregone when developing a machine learning (ML) model. Since skipping this stage altogether results in a lower performance for ML models, a general-purpose time-series data cleaning user interface (UI) was developed in Python [1], with a target user base of people unfamiliar with data cleaning. Following the development, the UI was tested on time-series datasets available in online repositories, and a comparison between the estimation performance between ML models trained on original datasets and datasets cleaned through the UI was carried out. This comparison showed that the use of the UI can result in significant improvements to the performance of ML models; however, the degree of said improvement is highly dataset dependent. / En betydande nackdel med de närvarande metoderna som används för datarensning är att lita på domänkunskap eller en bakgrund inom datavetenskap. Med det stora antalet möjliga lösningar på detta problem kan datarensning steget helt utelämnas när en maskininlärningsmodell (ML) utvecklas. Eftersom att hoppa över det här steget resulterar i en lägre prestanda för ML-modeller, utvecklades ett allmänt användargränssnitt för datarensning av tidsserier (UI) i Python [1] som kan bli använda av personer som inte är bekanta med datarensning. Användargränssnittet testades på tidsseriedatauppsättningar som finns tillgängliga i onlinearkiv, och en jämförelse av uppskattningsprestanda mellan ML-modeller som tränats på ursprungliga datauppsättningar och datauppsättningar som rensats via användargränssnittet genomfördes. Denna jämförelse visade att användningen av användargränssnittet kan resultera i betydande förbättringar av ML-modellernas prestanda men förbättringsgraden är datamängdsberoende.
|
45 |
Time series data mining using complex networks / Mineração de dados em séries temporais usando redes complexasFerreira, Leonardo Nascimento 15 September 2017 (has links)
A time series is a time-ordered dataset. Due to its ubiquity, time series analysis is interesting for many scientific fields. Time series data mining is a research area that is intended to extract information from these time-related data. To achieve it, different models are used to describe series and search for patterns. One approach for modeling temporal data is by using complex networks. In this case, temporal data are mapped to a topological space that allows data exploration using network techniques. In this thesis, we present solutions for time series data mining tasks using complex networks. The primary goal was to evaluate the benefits of using network theory to extract information from temporal data. We focused on three mining tasks. (1) In the clustering task, we represented every time series by a vertex and we connected vertices that represent similar time series. We used community detection algorithms to cluster similar series. Results show that this approach presents better results than traditional clustering results. (2) In the classification task, we mapped every labeled time series in a database to a visibility graph. We performed classification by transforming an unlabeled time series to a visibility graph and comparing it to the labeled graphs using a distance function. The new label is the most frequent label in the k-nearest graphs. (3) In the periodicity detection task, we first transform a time series into a visibility graph. Local maxima in a time series are usually mapped to highly connected vertices that link two communities. We used the community structure to propose a periodicity detection algorithm in time series. This method is robust to noisy data and does not require parameters. With the methods and results presented in this thesis, we conclude that network science is beneficial to time series data mining. Moreover, this approach can provide better results than traditional methods. It is a new form of extracting information from time series and can be easily extended to other tasks. / Séries temporais são conjuntos de dados ordenados no tempo. Devido à ubiquidade desses dados, seu estudo é interessante para muitos campos da ciência. A mineração de dados temporais é uma área de pesquisa que tem como objetivo extrair informações desses dados relacionados no tempo. Para isso, modelos são usados para descrever as séries e buscar por padrões. Uma forma de modelar séries temporais é por meio de redes complexas. Nessa modelagem, um mapeamento é feito do espaço temporal para o espaço topológico, o que permite avaliar dados temporais usando técnicas de redes. Nesta tese, apresentamos soluções para tarefas de mineração de dados de séries temporais usando redes complexas. O objetivo principal foi avaliar os benefícios do uso da teoria de redes para extrair informações de dados temporais. Concentramo-nos em três tarefas de mineração. (1) Na tarefa de agrupamento, cada série temporal é representada por um vértice e as arestas são criadas entre as séries de acordo com sua similaridade. Os algoritmos de detecção de comunidades podem ser usados para agrupar séries semelhantes. Os resultados mostram que esta abordagem apresenta melhores resultados do que os resultados de agrupamento tradicional. (2) Na tarefa de classificação, cada série temporal rotulada em um banco de dados é mapeada para um gráfico de visibilidade. A classificação é realizada transformando uma série temporal não marcada em um gráfico de visibilidade e comparando-a com os gráficos rotulados usando uma função de distância. O novo rótulo é dado pelo rótulo mais frequente nos k grafos mais próximos. (3) Na tarefa de detecção de periodicidade, uma série temporal é primeiramente transformada em um gráfico de visibilidade. Máximos locais em uma série temporal geralmente são mapeados para vértices altamente conectados que ligam duas comunidades. O método proposto utiliza a estrutura de comunidades para realizar a detecção de períodos em séries temporais. Este método é robusto para dados ruidosos e não requer parâmetros. Com os métodos e resultados apresentados nesta tese, concluímos que a teoria da redes complexas é benéfica para a mineração de dados em séries temporais. Além disso, esta abordagem pode proporcionar melhores resultados do que os métodos tradicionais e é uma nova forma de extrair informações de séries temporais que pode ser facilmente estendida para outras tarefas.
|
46 |
Waveform clustering - Grouping similar power system eventsEriksson, Therése, Mahmoud Abdelnaeim, Mohamed January 2019 (has links)
Over the last decade, data has become a highly valuable resource. Electrical power grids deal with large quantities of data, and continuously collect this for analytical purposes. Anomalies that occur within this data is important to identify since they could cause nonoptimal performance within the substations, or in worse cases damage to the substations themselves. However, large datasets in the order of millions are hard or even impossible to gain a reasonable overview of the data manually. When collecting data from electrical power grids, predefined triggering criteria are often used to indicate that an event has occurred within the specific system. This makes it difficult to search for events that are unknown to the operator of the deployed acquisition system. Clustering, an unsupervised machine learning method, can be utilised for fault prediction within systems generating large amounts of multivariate time-series data without labels and can group data more efficiently and without the bias of a human operator. A large number of clustering techniques exist, as well as methods for extracting information from the data itself, and identification of these was of utmost importance. This thesis work presents a study of the methods involved in the creation of such a clustering system which is suitable for the specific type of data. The objective of the study was to identify methods that enables finding the underlying structures of the data and cluster the data based on these. The signals were split into multiple frequency sub-bands and from these features could be extracted and evaluated. Using suitable combinations of features the data was clustered with two different clustering algorithms, CLARA and CLARANS, and evaluated with established quality analysis methods. The results indicate that CLARA performed overall best on all the tested feature sets. The formed clusters hold valuable information such as indications of unknown events within the system, and if similar events are clustered together this can assist a human operator further to investigate the importance of the clusters themselves. A further conclusion from the results is that research into the use of more optimised clustering algorithms is necessary so that expansion into larger datasets can be considered.
|
47 |
Modelování durací mezi finančními transakcemi / Modeling of duration between financial transactionsVoráčková, Andrea January 2018 (has links)
❆❜str❛❝t ❚❤✐s ❞✐♣❧♦♠❛ t❤❡s✐s ❞❡❛❧s ✇✐t❤ ♣r♦♣❡rt✐❡s ♦❢ ❆❈❉ ♣r♦❝❡ss ❛♥❞ ♠❡t❤♦❞s ♦❢ ✐ts ❡st✐♠❛t✐♦♥✳ ❋✐rst✱ t❤❡ ❜❛s✐❝ ❞❡☞♥✐t✐♦♥s ❛♥❞ r❡❧❛t✐♦♥s ❜❡t✇❡❡♥ ❆❘▼❆ ❛♥❞ ●❆❘❈❍ ♣r♦❝❡ss❡s ❛r❡ st❛t❡❞✳ ■♥ t❤❡ s❡❝♦♥❞ ♣❛rt ♦❢ t❤❡ t❤❡s✐s✱ t❤❡ ❆❈❉ ♣r♦❝❡ss ✐s ❞❡☞♥❡❞ ❛♥❞ t❤❡ r❡❧❛t✐♦♥ ❜❡t✇❡❡♥ ❆❘▼❆ ❛♥❞ ❆❈❉ ✐s s❤♦✇♥✳ ❚❤❡♥ ✇❡ s❤♦✇ t❤❡ ♠❡t❤♦❞s ♦❢ ❞❛t❛ ❛❞❥✉st♠❡♥t✱ ❡st✐♠❛t✐♦♥✱ ♣r❡❞✐❝t✐♦♥ ❛♥❞ ✈❡r✐☞❝❛t✐♦♥ ♦❢ t❤❡ ❆❈❉ ♠♦❞❡❧✳ ❆❢t❡r t❤❛t✱ t❤❡ ♣❛rt✐❝✉❧❛r ❝❛s❡s ♦❢ ❆❈❉ ♣r♦❝❡ss✿ ❊❆❈❉✱ ❲❆❈❉✱ ●❆❈❉✱ ●❊❱❆❈❉ ✇✐t❤ ✐ts ♣r♦♣❡rt✐❡s ❛♥❞ t❤❡ ♠♦t✐✈❛t✐♦♥❛❧ ❡①❛♠♣❧❡s ❛r❡ ✐♥tr♦❞✉❝❡❞✳ ❚❤❡ ♥✉♠❡r✐❝❛❧ ♣❛rt ✐s ♣❡r❢♦r♠❡❞ ✐♥ ❘ s♦❢t✇❛r❡ ❛♥❞ ❝♦♥❝❡r♥s t❤❡ ♣r❡❝✐s✐♦♥ ♦❢ t❤❡ ❡st✐♠❛t❡s ❛♥❞ ♣r❡❞✐❝t✐♦♥s ♦❢ t❤❡ s♣❡❝✐❛❧ ❝❛s❡s ♦❢ ❆❈❉ ♠♦❞❡❧ ❞❡♣❡♥❞✐♥❣ ♦♥ t❤❡ ❧❡♥❣t❤ ♦❢ s❡r✐❡s ❛♥❞ ♥✉♠❜❡r ♦❢ s✐♠✉❧❛t✐♦♥s✳ ■♥ t❤❡ ❧❛st ♣❛rt✱ ✇❡ ❛♣♣❧② t❤❡ ♠❡t❤♦❞s st❛t❡❞ ✐♥ t❤❡♦r❡t✐❝❛❧ ♣❛rt ♦♥ r❡❛❧ ❞❛t❛✳ ❚❤❡ ❛❞❥✉st♠❡♥t ♦❢ t❤❡ ❞❛t❛ ❛♥❞ ❡st✐♠❛t✐♦♥ ♦❢ t❤❡ ♣❛r❛♠❡t❡rs ✐s ♣❡r❢♦r♠❡❞ ❛s ✇❡❧❧ ❛s t❤❡ ✈❡r✐☞❝❛t✐♦♥ ♦❢ t❤❡ ❆❈❉ ♠♦❞❡❧✳ ❆❢t❡r t❤❛t✱ ✇❡ ♣r❡❞✐❝t ❢❡✇ st❡♣s ❛♥❞ ❝♦♠♣❛r❡ t❤❡♠ ✇✐t❤ r❡❛❧ ❞✉r❛t✐♦♥s✳ ✶
|
48 |
Time series data mining using complex networks / Mineração de dados em séries temporais usando redes complexasLeonardo Nascimento Ferreira 15 September 2017 (has links)
A time series is a time-ordered dataset. Due to its ubiquity, time series analysis is interesting for many scientific fields. Time series data mining is a research area that is intended to extract information from these time-related data. To achieve it, different models are used to describe series and search for patterns. One approach for modeling temporal data is by using complex networks. In this case, temporal data are mapped to a topological space that allows data exploration using network techniques. In this thesis, we present solutions for time series data mining tasks using complex networks. The primary goal was to evaluate the benefits of using network theory to extract information from temporal data. We focused on three mining tasks. (1) In the clustering task, we represented every time series by a vertex and we connected vertices that represent similar time series. We used community detection algorithms to cluster similar series. Results show that this approach presents better results than traditional clustering results. (2) In the classification task, we mapped every labeled time series in a database to a visibility graph. We performed classification by transforming an unlabeled time series to a visibility graph and comparing it to the labeled graphs using a distance function. The new label is the most frequent label in the k-nearest graphs. (3) In the periodicity detection task, we first transform a time series into a visibility graph. Local maxima in a time series are usually mapped to highly connected vertices that link two communities. We used the community structure to propose a periodicity detection algorithm in time series. This method is robust to noisy data and does not require parameters. With the methods and results presented in this thesis, we conclude that network science is beneficial to time series data mining. Moreover, this approach can provide better results than traditional methods. It is a new form of extracting information from time series and can be easily extended to other tasks. / Séries temporais são conjuntos de dados ordenados no tempo. Devido à ubiquidade desses dados, seu estudo é interessante para muitos campos da ciência. A mineração de dados temporais é uma área de pesquisa que tem como objetivo extrair informações desses dados relacionados no tempo. Para isso, modelos são usados para descrever as séries e buscar por padrões. Uma forma de modelar séries temporais é por meio de redes complexas. Nessa modelagem, um mapeamento é feito do espaço temporal para o espaço topológico, o que permite avaliar dados temporais usando técnicas de redes. Nesta tese, apresentamos soluções para tarefas de mineração de dados de séries temporais usando redes complexas. O objetivo principal foi avaliar os benefícios do uso da teoria de redes para extrair informações de dados temporais. Concentramo-nos em três tarefas de mineração. (1) Na tarefa de agrupamento, cada série temporal é representada por um vértice e as arestas são criadas entre as séries de acordo com sua similaridade. Os algoritmos de detecção de comunidades podem ser usados para agrupar séries semelhantes. Os resultados mostram que esta abordagem apresenta melhores resultados do que os resultados de agrupamento tradicional. (2) Na tarefa de classificação, cada série temporal rotulada em um banco de dados é mapeada para um gráfico de visibilidade. A classificação é realizada transformando uma série temporal não marcada em um gráfico de visibilidade e comparando-a com os gráficos rotulados usando uma função de distância. O novo rótulo é dado pelo rótulo mais frequente nos k grafos mais próximos. (3) Na tarefa de detecção de periodicidade, uma série temporal é primeiramente transformada em um gráfico de visibilidade. Máximos locais em uma série temporal geralmente são mapeados para vértices altamente conectados que ligam duas comunidades. O método proposto utiliza a estrutura de comunidades para realizar a detecção de períodos em séries temporais. Este método é robusto para dados ruidosos e não requer parâmetros. Com os métodos e resultados apresentados nesta tese, concluímos que a teoria da redes complexas é benéfica para a mineração de dados em séries temporais. Além disso, esta abordagem pode proporcionar melhores resultados do que os métodos tradicionais e é uma nova forma de extrair informações de séries temporais que pode ser facilmente estendida para outras tarefas.
|
49 |
Time Series Data Analysis of Single Subject Experimental Designs Using Bayesian EstimationAerts, Xing Qin 08 1900 (has links)
This study presents a set of data analysis approaches for single subject designs (SSDs). The primary purpose is to establish a series of statistical models to supplement visual analysis in single subject research using Bayesian estimation. Linear modeling approach has been used to study level and trend changes. I propose an alternate approach that treats the phase change-point between the baseline and intervention conditions as an unknown parameter. Similar to some existing approaches, the models take into account changes in slopes and intercepts in the presence of serial dependency. The Bayesian procedure used to estimate the parameters and analyze the data is described. Researchers use a variety of statistical analysis methods to analyze different single subject research designs. This dissertation presents a series of statistical models to model data from various conditions: the baseline phase, A-B design, A-B-A-B design, multiple baseline design, alternating treatments design, and changing criterion design. The change-point evaluation method can provide additional confirmation of causal effect of the treatment on target behavior. Software codes are provided as supplemental materials in the appendices. The applicability for the analyses is demonstrated using five examples from the SSD literature.
|
50 |
Implementation of Anomaly Detection on a Time-series Temperature Data setNovacic, Jelena, Tokhi, Kablai January 2019 (has links)
Aldrig har det varit lika aktuellt med hållbar teknologi som idag. Behovet av bättre miljöpåverkan inom alla områden har snabbt ökat och energikonsumtionen är ett av dem. En enkel lösning för automatisk kontroll av energikonsumtionen i smarta hem är genom mjukvara. Med dagens IoT teknologi och maskinlärningsmodeller utvecklas den mjukvarubaserade hållbara livsstilen allt mer. För att kontrollera ett hushålls energikonsumption måste plötsligt avvikande beteenden detekteras och regleras för att undvika onödig konsumption. Detta examensarbete använder en tidsserie av temperaturdata för att implementera detektering av anomalier. Fyra modeller implementerades och testades; en linjär regressionsmodell, Pandas EWM funktion, en EWMA modell och en PEWMA modell. Varje modell testades genom att använda dataset från nio olika lägenheter, från samma tidsperiod. Därefter bedömdes varje modell med avseende på Precision, Recall och F-measure, men även en ytterligare bedömning gjordes för linjär regression med R^2-score. Resultaten visar att baserat på noggrannheten hos varje modell överträffade PEWMA de övriga modellerna. EWMA modeller var något bättre än den linjära regressionsmodellen, följt av Pandas egna EWM modell. / Today's society has become more aware of its surroundings and the focus has shifted towards green technology. The need for better environmental impact in all areas is rapidly growing and energy consumption is one of them. A simple solution for automatically controlling the energy consumption of smart homes is through software. With today's IoT technology and machine learning models the movement towards software based ecoliving is growing. In order to control the energy consumption of a household, sudden abnormal behavior must be detected and adjusted to avoid unnecessary consumption. This thesis uses a time-series data set of temperature data for implementation of anomaly detection. Four models were implemented and tested; a Linear Regression model, Pandas EWM function, an exponentially weighted moving average (EWMA) model and finally a probabilistic exponentially weighted moving average (PEWMA) model. Each model was tested using data sets from nine different apartments, from the same time period. Then an evaluation of each model was conducted in terms of Precision, Recall and F-measure, as well as an additional evaluation for Linear Regression, using R^2 score. The results of this thesis show that in terms of accuracy, PEWMA outperformed the other models. The EWMA model was slightly better than the Linear Regression model, followed by the Pandas EWM model.
|
Page generated in 0.0788 seconds