Global ETD Search

61	Interaktiv identifiering av avvikelser i mätdata från testning av kretskort Berglund, Ebba, Kazemi, Baset January 2024 (has links) Visualisering är ett kraftfullt verktyg vid dataanalys, särskilt för att identifiera avvikelser. Att effektivt kunna identifiera felaktiga komponenter i elektronik kan förbättra och utveckla produktionsprocesserna avsevärd. Genom att tydligt visa korrelationen mellan felaktiga och fungerande komponenter kan analytiker identifiera nyckelkomponenter som orsakar defekta produkter. Multivariata data och multivariata tidsseriedata ställer höga krav på visualiseringar på grund av deras komplexitet. Den höga dimensionaliteten kan leda till problem som överlappning och dolda mönster beroende på vilken visualiseringsteknik som används. För att uppnå effektiv visualisering av multivariata data och multivariata tidsseriedata krävs det att både trender över tid och korrelationer mellan olika variabler visas. Studien genomfördes i samarbete med konsultföretaget Syntronic AB för att identifiera lämpliga visualiseringstekniker för data som samlats in vid testning av kretskort. Metoden som användes är design science, vilket omfattar en litteraturstudie, utveckling av prototyp och utvärdering av prototypen. Prototypen består av tre visualiseringstekniker som är: Kategorisk heatmap, Parallella koordinater och Scatterplot. Dessa tekniker jämfördes systematiskt för att bedöma deras effektivitet. Utvärderingen består av kvantitativa metoder såsom mätningar och enkäter, samt den kvalitativa metoden intervju. Resultatet av studien presenterar den utvecklade prototypen och analysen av utvärderingen. Resultatet av studien visar att kategoriska heatmaps är effektiv för att identifiera samband mellan avvikelser i multivariat data. Även om alla användare upplevde visualiseringen svårtolkad vid en första anblick uttryckte de att visualiseringen var effektiv på att visa korrelationer mellan avvikelser. Parallella koordinater upplevdes svårtolkad och ineffektiv på grund av den höga dimensionaliteten där alla dimensioner inte kan visas samtidigt. Förbättringsförslag för att öka användarvänlighet och användarupplevelse lyftes där tree view förslogs som ett alternativ för att välja de dimensioner som ska visas i stället för reglaget. Scatterplots visade sig vara användbar för att analysera enskilda testpunkter och visade generella trender på ett tydligt och begripligt sätt. Studien har även visat att interaktiviteten påverkar upplevelsen av visualisering, där begränsad interaktivitet medför att tekniken upplevds mindre användbar för att identifiera relationer mellan avvikelser. / Visualization is of great importance when analyzing data, especially when distinguishing anomalies. Identifying faulty components of electronics could evolve and improve the production processes tremendously. By effectively displaying the correlation between faulty and working components, analytics can identify key components causing faulty products.Multivariate data and multivariate time series data place high demands on visualizations due to their complexity. The high dimensionality can lead to issues such as overlapping and hidden patterns, depending on the visualization technique used. To achieve effective visualization of multivariate data and multivariate time series data, it is necessary to show both trends over time and correlations between different variables. This study was conducted in cooperation with Syntronic AB, a consulting company, to help identify suitable visualization techniques for data gathered by testing circuit boards. The methodology used is design research which includes research gathering, development of a prototype and evaluation of the prototype. The prototype consists of three visualization techniques: Categorical heatmap, Parallel Coordinates, and Scatterplot. These techniques were systematically compared to assess their effectiveness. The evaluation consists of quantitative methods such as time measurement and survey, and the qualitative method interview. The result of the study shows the developed prototype and the analysis of the evaluation. As a result, the study found categorical heatmaps effective in distinguishing correlation between anomalies in multivariate data. Although all users found the visualization difficult to grasp at first glance, expressed their beliefs regarding the effectiveness of displaying correlation. Parallel Coordinates were perceived as difficult to interpret and ineffective for high-dimensional datasets where all dimensions can´t be displayed simultaneously. Interactive options such as tree view to select test pointsto visualize were suggested to further improve the usefulness of Parallel Coordinates. Scatterplot proved useful for analyzing individual test points and showed general trends in a user-friendly way. Furthermore, the study also showed that interactivity affect the perception of visualizations. Limited interactivity resulted in users finding the visualizations less effective in distinguishing anomalies and were perceived as less user-friendly. Visualization Multivariate Time Series Data Anomaly Detection of Coherence Parallel Coordinates Scatterplot Datavisualisering Multivariat tidsseriedata Felidentifiering av samband Heatmap Parallella koordinater Scatterplot Computer and Information Sciences Data- och informationsvetenskap
62	Efficient Resource Management : A Comparison of Predictive Scaling Algorithms in Cloud-Based Applications Dahl, Johanna, Strömbäck, Elsa January 2024 (has links) This study aims to explore predictive scaling algorithms used to predict and manage workloadsin a containerized system. The goal is to identify which predictive scaling approach delivers themost effective results, contributing to research on cloud elasticity and resource management.This potentially leads to reduced infrastructure costs while maintaining efficient performance,enabling a more sustainable cloud-computing technology. The work involved the developmentand comparison of three different autoscaling algorithms with an interchangeable predictioncomponent. For the predictive part, three different time-series analysis methods were used:XGBoost, ARIMA, and Prophet. A simulation system with the necessary modules wasdeveloped, as well as a designated target service to experience the load. Each algorithm'sscaling accuracy was evaluated by comparing its suggested number of instances to the optimalnumber, with each instance representing a simulated CPU core. The results showed varyingefficiency: XGBoost and Prophet excelled with richer datasets, while ARIMA performed betterwith limited data. Although XGBoost and Prophet maintained 100% uptime, this could lead toresource wastage, whereas ARIMA's lower uptime percentage possibly suggested a moreresource-efficient, though less reliable, approach. Further analysis, particularly experimentalinvestigation is required to deepen the understanding of these predictors' influence on resourceallocation. forecasting cloud computing time series data machine learning containerization Prophet ARIMA XGBoost Engineering and Technology Teknik och teknologier Computer and Information Sciences Data- och informationsvetenskap
63	Monitoring energy performance in local authority buildings Stuart, Graeme January 2011 (has links) Energy management has been an important function of organisations since the oil crisis of the mid 1970’s led to hugely increased costs of energy. Although the financial costs of energy are still important, the growing recognition of the environmental costs of fossil-fuel energy is becoming more important. Legislation is also a key driver. The UK has set an ambitious greenhouse gas (GHG) reduction target of 80% of 1990 levels by 2050 in response to a strong international commitment to reduce GHG emissions globally. This work is concerned with the management of energy consumption in buildings through the analysis of energy consumption data. Buildings are a key source of emissions with a wide range of energy-consuming equipment, such as photocopiers or refrigerators, boilers, air-conditioning plant and lighting, delivering services to the building occupants. Energy wastage can be identified through an understanding of consumption patterns and in particular, of changes in these patterns over time. Changes in consumption patterns may have any number of causes; a fault in heating controls; a boiler or lighting replacement scheme; or a change in working practice entirely unrelated to energy management. Standard data analysis techniques such as degree-day modelling and CUSUM provide a means to measure and monitor consumption patterns. These techniques were designed for use with monthly billing data. Modern energy metering systems automatically generate data at half-hourly or better resolution. Standard techniques are not designed to capture the detailed information contained in this comparatively high-resolution data. The introduction of automated metering also introduces the need for automated analysis. This work assumes that consumption patterns are generally consistent in the short-term but will inevitably change. A novel statistical method is developed which builds automated event detection into a novel consumption modelling algorithm. Understanding these changes to consumption patterns is critical to energy management. Leicester City Council has provided half-hourly data from over 300 buildings covering up to seven years of consumption (a total of nearly 50 million meter readings). Automatic event detection pinpoints and quantifies over 5,000 statistically significant events in the Leicester dataset. It is shown that the total impact of these events is a decrease in overall consumption. Viewing consumption patterns in this way allows for a new, event-oriented approach to energy management where large datasets are automatically and rapidly analysed to produce summary meta-data describing their salient features. These event-oriented meta-data can be used to navigate the raw data event by event and are highly complementary to strategic energy management.
64	Learning and smoothing in switching Markov models with copulas Zheng, Fei 18 December 2017 (has links) Les modèles de Markov à sauts (appelés JMS pour Jump Markov System) sont utilisés dans de nombreux domaines tels que la poursuite de cibles, le traitement des signaux sismiques et la finance, étant donné leur bonne capacité à modéliser des systèmes non-linéaires et non-gaussiens. De nombreux travaux ont étudié les modèles de Markov linéaires pour lesquels bien souvent la restauration de données est réalisée grâce à des méthodes d’échantillonnage statistique de type Markov Chain Monte-Carlo. Dans cette thèse, nous avons cherché des solutions alternatives aux méthodes MCMC et proposons deux originalités principales. La première a consisté à proposer un algorithme de restauration non supervisée d’un JMS particulier appelé « modèle de Markov couple à sauts conditionnellement gaussiens » (noté CGPMSM). Cet algorithme combine une méthode d’estimation des paramètres basée sur le principe Espérance-Maximisation (EM) et une méthode efficace pour lisser les données à partir des paramètres estimés. La deuxième originalité a consisté à étendre un CGPMSM spécifique appelé CGOMSM par l’introduction des copules. Ce modèle, appelé GCOMSM, permet de considérer des distributions plus générales que les distributions gaussiennes tout en conservant des méthodes de restauration optimales et rapides. Nous avons équipé ce modèle d’une méthode d’estimation des paramètres appelée GICE-LS, combinant le principe de la méthode d’estimation conditionnelle itérative généralisée et le principe des moindre-carrés linéaires. Toutes les méthodes sont évaluées sur des données simulées. En particulier, les performances de GCOMSM sont discutées au regard de modèles de Markov non-linéaires et non-gaussiens tels que la volatilité stochastique, très utilisée dans le domaine de la finance. / Switching Markov Models, also called Jump Markov Systems (JMS), are widely used in many fields such as target tracking, seismic signal processing and finance, since they can approach non-Gaussian non-linear systems. A considerable amount of related work studies linear JMS in which data restoration is achieved by Markov Chain Monte-Carlo (MCMC) methods. In this dissertation, we try to find alternative restoration solution for JMS to MCMC methods. The main contribution of our work includes two parts. Firstly, an algorithm of unsupervised restoration for a recent linear JMS known as Conditionally Gaussian Pairwise Markov Switching Model (CGPMSM) is proposed. This algorithm combines a parameter estimation method named Double EM, which is based on the Expectation-Maximization (EM) principle applied twice sequentially, and an efficient approach for smoothing with estimated parameters. Secondly, we extend a specific sub-model of CGPMSM known as Conditionally Gaussian Observed Markov Switching Model (CGOMSM) to a more general one, named Generalized Conditionally Observed Markov Switching Model (GCOMSM) by introducing copulas. Comparing to CGOMSM, the proposed GCOMSM adopts inherently more flexible distributions and non-linear structures, while optimal restoration is feasible. In addition, an identification method called GICE-LS based on the Generalized Iterative Conditional Estimation (GICE) and the Least-Square (LS) principles is proposed for GCOMSM to approximate any non-Gaussian non-linear systems from their sample data set. All proposed methods are tested by simulation. Moreover, the performance of GCOMSM is discussed by application on other generable non-Gaussian non-linear Markov models, for example, on stochastic volatility models which are of great importance in finance. Modèles de Markov à sauts Chaîne de Markov triplet Identification de modèles Algorithme espérance-maximisation Switching Markov models Non-Gaussian non-linear Markov system Triplet Markov chain Model identification Optimal time series data restoration Expectation-Maximization
65	過濾靴帶反覆抽樣與一般動差估計式 / Sieve Bootstrap Inference Based on GMM Estimators of Time Series Data 劉祝安, Liu, Chu-An Unknown Date (has links) In this paper, we propose two types of sieve bootstrap, univariate and multivariate approach, for the generalized method of moments estimators of time series data. Compared with the nonparametric block bootstrap, the sieve bootstrap is in essence parametric, which helps fitting data better when researchers have prior information about the time series properties of the variables of interested. Our Monte Carlo experiments show that the performances of these two types of sieve bootstrap are comparable to the performance of the block bootstrap. Furthermore, unlike the block bootstrap, which is sensitive to the choice of block length, these two types of sieve bootstrap are less sensitive to the choice of lag length. 過濾靴帶反覆抽樣法區塊拔靴法一般動差估計式時間序列資料 Sieve bootstrap block bootstrap GMM estimators time series data
66	Combinaison de l’Internet des objets, du traitement d’évènements complexes et de la classification de séries temporelles pour une gestion proactive de processus métier / Combining the Internet of things, complex event processing, and time series classification for a proactive business process management. Mousheimish, Raef 27 October 2017 (has links) L’internet des objets est au coeur desprocessus industriels intelligents grâce à lacapacité de détection d’évènements à partir dedonnées de capteurs. Cependant, beaucoup resteà faire pour tirer le meilleur parti de cettetechnologie récente et la faire passer à l’échelle.Cette thèse vise à combler le gap entre les fluxmassifs de données collectées par les capteurs etleur exploitation effective dans la gestion desprocessus métier. Elle propose une approcheglobale qui combine le traitement de flux dedonnées, l’apprentissage supervisé et/oul’utilisation de règles sur des évènementscomplexes permettant de prédire (et doncéviter) des évènements indésirables, et enfin lagestion des processus métier étendue par cesrègles complexes.Les contributions scientifiques de cette thèse sesituent dans différents domaines : les processusmétiers plus intelligents et dynamiques; letraitement d’évènements complexes automatisépar l’apprentissage de règles; et enfin et surtout,dans le domaine de la fouille de données deséries temporelles multivariéespar la prédiction précoce de risques.L’application cible de cette thèse est le transportinstrumenté d’oeuvres d’art / Internet of things is at the core ofsmart industrial processes thanks to its capacityof event detection from data conveyed bysensors. However, much remains to be done tomake the most out of this recent technologyand make it scale. This thesis aims at filling thegap between the massive data flow collected bysensors and their effective exploitation inbusiness process management. It proposes aglobal approach, which combines stream dataprocessing, supervised learning and/or use ofcomplex event processing rules allowing topredict (and thereby avoid) undesirable events,and finally business process managementextended to these complex rules. The scientificcontributions of this thesis lie in several topics:making the business process more intelligentand more dynamic; automation of complexevent processing by learning the rules; and lastand not least, in datamining for multivariatetime series by early prediction of risks. Thetarget application of this thesis is theinstrumented transportation of artworks. Traitement des événements complexes Fouille de séries temporelles Lassification précoce Séries temporelles multivariées Gestion des processus métiers Complex Event Processing Time Series Data Mining Early Classification Multivariate Time Series Business Process Management
67	Multivariate Time Series Data Generation using Generative Adversarial Networks : Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN Nord, Sofia January 2021 (has links) Large datasets are a crucial requirement to achieve high performance, accuracy, and generalisation for any machine learning task, such as prediction or anomaly detection, However, it is not uncommon for datasets to be small or imbalanced since gathering data can be difficult, time-consuming, and expensive. In the task of collecting vehicle sensor time series data, in particular when the vehicle has an abnormal behaviour, these struggles are present and may hinder the automotive industry in its development. Synthetic data generation has become a growing interest among researchers in several fields to handle the struggles with data gathering. Among the methods explored for generating data, generative adversarial networks (GANs) have become a popular approach due to their wide application domain and successful performance. This thesis focuses on generating multivariate time series data that are similar to vehicle sensor readings from the air pressures in the brake system of vehicles with an abnormal behaviour, meaning there is a leakage somewhere in the system. A novel GAN architecture called TimeGAN was trained to generate such data and was then evaluated using both qualitative and quantitative evaluation metrics. Two versions of this model were tested and compared. The results obtained proved that both models learnt the distribution and the underlying information within the features of the real data. The goal of the thesis was achieved and can become a foundation for future work in this field. / När man applicerar en modell för att utföra en maskininlärningsuppgift, till exempel att förutsäga utfall eller upptäcka avvikelser, är det viktigt med stora dataset för att uppnå hög prestanda, noggrannhet och generalisering. Det är dock inte ovanligt att dataset är små eller obalanserade eftersom insamling av data kan vara svårt, tidskrävande och dyrt. När man vill samla tidsserier från sensorer på fordon är dessa problem närvarande och de kan hindra bilindustrin i dess utveckling. Generering av syntetisk data har blivit ett växande intresse bland forskare inom flera områden som ett sätt att hantera problemen med datainsamling. Bland de metoder som undersökts för att generera data har generative adversarial networks (GANs) blivit ett populärt tillvägagångssätt i forskningsvärlden på grund av dess breda applikationsdomän och dess framgångsrika resultat. Denna avhandling fokuserar på att generera flerdimensionell tidsseriedata som liknar fordonssensoravläsningar av lufttryck i bromssystemet av fordon med onormalt beteende, vilket innebär att det finns ett läckage i systemet. En ny GAN modell kallad TimeGAN tränades för att genera sådan data och utvärderades sedan både kvalitativt och kvantitativt. Två versioner av denna modell testades och jämfördes. De erhållna resultaten visade att båda modellerna lärde sig distributionen och den underliggande informationen inom de olika signalerna i den verkliga datan. Målet med denna avhandling uppnåddes och kan lägga grunden för framtida arbete inom detta område. Time Series Data Generation Generative Adversarial Network Deep Neural Network Data Augmentation Synthetic Data Generation Generering av Tidsseriedata Generativa Motstridande Nätverk Djupa Neurala Nätverk Dataökning Syntetisk Datagenerering Computer and Information Sciences Data- och informationsvetenskap
68	Assessing Query Execution Time and Implementational Complexity in Different Databases for Time Series Data / Utvärdering av frågeexekveringstid och implementeringskomplexitet i olika databaser för tidsseriedata Jama Mohamud, Nuh, Söderström Broström, Mikael January 2024 (has links) Traditional database management systems are designed for general purpose data handling, and fail to work efficiently with time-series data due to characteristics like high volume, rapid ingestion rates, and a focus on temporal relationships. However, what is a best solution is not a trivial question to answer. Hence, this thesis aims to analyze four different Database Management Systems (DBMS) to determine their suitability for managing time series data, with a specific focus on Internet of Things (IoT) applications. The DBMSs examined include PostgreSQL, TimescaleDB, ClickHouse, and InfluxDB. This thesis evaluates query performance across varying dataset sizes and time ranges, as well as the implementational complexity of each DBMS. The benchmarking results indicate that InfluxDB consistently delivers the best performance, though it involves higher implementational complexity and time consumption. ClickHouse emerges as a strong alternative with the second-best performance and the simplest implementation. The thesis also identifies potential biases in benchmarking tools and suggests that TimescaleDB's performance may have been affected by configuration errors. The findings provide significant insights into the performance metrics and implementation challenges of the selected DBMSs. Despite limitations in fully addressing the research questions, this thesis offers a valuable overview of the examined DBMSs in terms of performance and implementational complexity. These results should be considered alongside additional research when selecting a DBMS for time series data. / Traditionella databashanteringssystem är utformade för allmän datahantering och fungerar inte effektivt med tidsseriedata på grund av egenskaper som hög volym, snabba insättningshastigheter och fokus på tidsrelationer. Dock är frågan om vad som är den bästa lösningen inte trivial. Därför syftar denna avhandling till att analysera fyra olika databashanteringssystem (DBMS) för att fastställa deras lämplighet för att hantera tidsseriedata, med ett särskilt fokus på Internet of Things (IoT)-applikationer. De DBMS som undersöks inkluderar PostgreSQL, TimescaleDB, ClickHouse och InfluxDB. Denna avhandling utvärderar sökprestanda över varierande datamängder och tidsintervall, samt implementeringskomplexiteten för varje DBMS. Prestandaresultaten visar att InfluxDB konsekvent levererar den bästa prestandan, men med högre implementeringskomplexitet och tidsåtgång. ClickHouse framstår som ett starkt alternativ med näst bäst prestanda och är enklast att implementera. Studien identifierar också potentiella partiskhet i prestandaverktygen och antyder att TimescaleDB:s prestandaresultat kan ha påverkats av konfigurationsfel. Resultaten ger betydande insikter i prestandamått och implementeringsutmaningar för de utvalda DBMS. Trots begränsningarna i att fullt ut besvara forskningsfrågorna erbjuder studien en värdefull översikt. Dessa resultat bör beaktas tillsammans med ytterligare forskning vid val av ett DBMS för tidsseriedata. Database Management System PostgreSQL TimescaleDB ClickHouse InfluxDB Time Series Data Google Cloud Platform Database Comparison Implementation Complexity Databashanteringssystem PostgreSQL TimescaleDB ClickHouse InfluxDB Tidsseriedata Google Cloud Platform Databasjämförelse Implementeringskomplexitet Computer and Information Sciences Data- och informationsvetenskap
69	Abandoned by Home and Burden of Host: Evaluating States' Economic Ability and Refugee Acceptance through Panel Data Analysis Tabassum, Ummey Hanney January 2018 (has links) No description available. International Relations International Law Middle Eastern Studies Peace Studies Political Science Regional Studies Statistics Panel data analysis Refugee studies Cross-sectional and Time series data UNHCR World Bank GNI per capita Fixed effects modeling Random effects modeling CEPII data Population density Weighted distance
70	Sign of the Times : Unmasking Deep Learning for Time Series Anomaly Detection / Skyltarna på Tiden : Avslöjande av djupinlärning för detektering av anomalier i tidsserier Richards Ravi Arputharaj, Daniel January 2023 (has links) Time series anomaly detection has been a longstanding area of research with applications across various domains. In recent years, there has been a surge of interest in applying deep learning models to this problem domain. This thesis presents a critical examination of the efficacy of deep learning models in comparison to classical approaches for time series anomaly detection. Contrary to the widespread belief in the superiority of deep learning models, our research findings suggest that their performance may be misleading and the progress illusory. Through rigorous experimentation and evaluation, we reveal that classical models outperform deep learning counterparts in various scenarios, challenging the prevailing assumptions. In addition to model performance, our study delves into the intricacies of evaluation metrics commonly employed in time series anomaly detection. We uncover how it inadvertently inflates the performance scores of models, potentially leading to misleading conclusions. By identifying and addressing these issues, our research contributes to providing valuable insights for researchers, practitioners, and decision-makers in the field of time series anomaly detection, encouraging a critical reevaluation of the role of deep learning models and the metrics used to assess their performance. / Tidsperiods avvikelsedetektering har varit ett långvarigt forskningsområde med tillämpningar inom olika områden. Under de senaste åren har det uppstått ett ökat intresse för att tillämpa djupinlärningsmodeller på detta problemområde. Denna avhandling presenterar en kritisk granskning av djupinlärningsmodellers effektivitet jämfört med klassiska metoder för tidsperiods avvikelsedetektering. I motsats till den allmänna övertygelsen om överlägsenheten hos djupinlärningsmodeller tyder våra forskningsresultat på att deras prestanda kan vara vilseledande och framsteg illusoriskt. Genom rigorös experimentell utvärdering avslöjar vi att klassiska modeller överträffar djupinlärningsalternativ i olika scenarier och därmed utmanar de rådande antagandena. Utöver modellprestanda går vår studie in på detaljerna kring utvärderings-metoder som oftast används inom tidsperiods avvikelsedetektering. Vi avslöjar hur dessa oavsiktligt överdriver modellernas prestandapoäng och kan därmed leda till vilseledande slutsatser. Genom att identifiera och åtgärda dessa problem bidrar vår forskning till att erbjuda värdefulla insikter för forskare, praktiker och beslutsfattare inom området tidsperiods avvikelsedetektering, och uppmanar till en kritisk omvärdering av djupinlärningsmodellers roll och de metoder som används för att bedöma deras prestanda. Anomaly detection multivariate time series data deep learning models model complexity resource-constrained systems Variational Autoencoders (VAEs) Convolutional Variational Autoencoders evaluation metrics in time series Anomalidetektering Multivariata tidsseriedata Djupinlärningsmodeller Modellkomplexitet Resursbegränsade system Variational Autoencoders (VAEs) Konvolutionella Variational Autoencoders Utvärderingsmått inom tidsserier Computer and Information Sciences Data- och informationsvetenskap

Search results