Global ETD Search

51	Experimental Study on Machine Learning with Approximation to Data Streams Jiang, Jiani January 2019 (has links) Realtime transferring of data streams enables many data analytics and machine learning applications in the areas of e.g. massive IoT and industrial automation. Big data volume of those streams is a significant burden or overhead not only to the transportation network, but also to the corresponding application servers. Therefore, researchers and scientists focus on reducing the amount of data needed to be transferred via data compressions and approximations. Data compression techniques like lossy compression can significantly reduce data volume with the price of data information loss. Meanwhile, how to do data compression is highly dependent on the corresponding applications. However, when apply the decompressed data in some data analysis application like machine learning, the results may be affected due to the information loss. In this paper, the author did a study on the impact of data compression to the machine learning applications. In particular, from the experimental perspective, it shows the tradeoff among the approximation error bound, compression ratio and the prediction accuracy of multiple machine learning methods. The author believes that, with proper choice, data compression can dramatically reduce the amount of data transferred with limited impact on the machine learning applications. / Realtidsöverföring av dataströmmar möjliggör många dataanalyser och maskininlärningsapplikationer inom områdena t.ex. massiv IoT och industriell automatisering. Stor datavolym för dessa strömmar är en betydande börda eller omkostnad inte bara för transportnätet utan också för motsvarande applikationsservrar. Därför fokuserar forskare och forskare om att minska mängden data som behövs för att överföras via datakomprimeringar och approximationer. Datakomprimeringstekniker som förlustkomprimering kan minska datavolymen betydligt med priset för datainformation. Samtidigt är datakomprimering mycket beroende av motsvarande applikationer. Men när du använder dekomprimerade data i en viss dataanalysapplikation som maskininlärning, kan resultaten påverkas på grund av informationsförlusten. I denna artikel gjorde författaren en studie om effekterna av datakomprimering på maskininlärningsapplikationerna. I synnerhet, från det experimentella perspektivet, visar det avvägningen mellan tillnärmningsfelbundet, kompressionsförhållande och förutsägbarhetsnoggrannheten för flera maskininlärningsmetoder. Författaren anser att datakomprimering med rätt val dramatiskt kan minska mängden data som överförs med begränsad inverkan på maskininlärningsapplikationerna. Elektroteknik och elektronik
52	Implementation of Hierarchical and K-Means Clustering Techniques on the Trend and Seasonality Components of Temperature Profile Data Ogedegbe, Emmanuel 01 December 2023 (has links) (PDF) In this study, time series decomposition techniques are used in conjunction with Kmeans clustering and Hierarchical clustering, two well-known clustering algorithms, to climate data. Their implementation and comparisons are then examined. The main objective is to identify similar climate trends and group geographical areas with similar environmental conditions. Climate data from specific places are collected and analyzed as part of the project. The time series is then split into trend, seasonality, and residual components. In order to categorize growing regions according to their climatic inclinations, the deconstructed time series are then submitted to K-means clustering and Hierarchical clustering with dynamic time warping. In order to understand how different regions’ climates compare to one another and how regions cluster based on the general trend of the temperature profile over the course of the full growing season as opposed to the seasonality component for the various locations, the created clusters are evaluated. Time series data K-Means Clustering Hierarchical Clustering Applied Mathematics Computer Sciences Data Science Statistics and Probability
53	Extraction of Global Features for enhancing Machine Learning Performance / Extraktion av Globala Egenskaper för förbättring av Maskininlärningsprestanda Tesfay, Abyel January 2023 (has links) Data Science plays an essential role in many organizations and industries to become data-driven in their decision-making and workflow, as models can provide relevant input in areas such as social media, the stock market, and manufacturing industries. To train models of quality, data preparation methods such as feature extraction are used to extract relevant features. However, global features are often ignored when feature extraction is performed on time-series datasets. This thesis aims to investigate how state-of-the-art tools and methods in data preparation and analytics can be used to extract global features and evaluate if such data could improve the performance of ML models. Global features refer to information that summarizes a full dataset such as the mean and median values from a numeric dataset. They could be used as inputs to make models understand the dataset and generalize better towards new data. The thesis went through a literature study to analyze feature extraction methods, time-series data, the definition of global features, and their benefits in bioprocessing. An effort was conducted to analyze and extract global features using tools and methods for data manipulation and feature extraction. The data used in the study consists of bioprocessing measurements of E. Coli cell growth as time-series data. The global features were evaluated through a performance comparison between models trained on a combined set of the dataset and global features, and models trained only on the full dataset. The study presents a method to extract global features with open-source tools and libraries, namely the Python language and the Numpy, Pandas, Matplot, and Scikit libraries. The quality of the global features depends on the experience in data science, data structure complexity, and domain area knowledge. The results show that the best models, trained on the dataset and global features combined, perform on average 15-18% better than models trained only on the dataset. The performance depends on the type and the number of global features combined with the dataset. Global features could be useful in manufacturing industries such as pharmaceutical and chemical, by helping models predict the inputs that lead to the desired trends and output. This could help promote sustainable production in various industries. / Datavetenskap spelar en stor roll inom många organsationer och industrier för att bli data-drivna inom beslutsfattande och arbetsflöde, varav maskininlärningsmodeller kan ge relevanta förslag inom områden som social media, aktiemarknaden samt tillverkningsindustrin. För att träna kvalitativa modeller används dataförberedande verktyg som funktionsextraktion för att utvinna relevanta egenskaper från data. Dock tar man ej hänsyn till globala egenskaper när funktionsextraktion utförs på tidsserie data. Denna examensarbete undersöker hur nuvarande verktyg inom dataförberededning och analys can användas för att utvinna global funktioner och utvärderar om sådan data kan förbättra prestandan hos maskinlärningsmodeller. Globla funktioner beskriver information som sammanfattar hel data, till exempel medelvärdet och medianen. De kan användas som indata för att få modeller förstå data och generalizera bättre mot ny data. Först utfördes en litteraturstudie inom metoder för funktionsextraktion, tidsserie data, definition av globala egenskaper samt möjligheter inom bioutvinning. Därefter utfördes en analys och utvinning av globala egenskaper med verktyg och metoder för data manipulation och funktionsutvinning. Den data som användes i arbetet består av mätningar från bioutvinning av E. Coli bakterier i form av tidsserie data. De globala funktionerna utvärderades genom en jämnförelse mellan modeller tränade på kombination av hel data och globala funktioner, och modeller tränade enbart på hel data. Studien presenterar en metod för att extrahera globala funktioner med öppet tillgänglig verktyg och bibliotek, som Python språket och Numpy, Pandas, Matplot och Scikit bibloteken. Kvaliteten på de globala funktionerna baseras på erfarenheten inom datavetenskap, datas komplexitet samt förståelse för domänområdet. Resultat visar att de bästa modellerna, tränade på data och globala funktioner, presterar i genomsnitt 15-18% bättre än modeller som tränats enbart på hel data. Prestandan detta beror på typen och antalet globala funktioner som kobineras med ursprungliga datat. Globala funktioner kan vara till nytta inom tillverkningsindustrier som farmaceutisk eller kemiska, genom att hjälpa modeller att förutsäga ingångsparametrar som leder till önskad produktion. Detta kan bidra till en hållbar produktion imon flera industrier. Machine Learning Deep Learning Feature Extraction Global Features Time-series data Bioprocessing Maskininlärning Djupinlärning Funktionsextraktion Globala Funktioner Tidsserie data Biobearbetning Computer and Information Sciences Data- och informationsvetenskap
54	CSAR: The Cross-Sectional Autoregression Model Lehner, Wolfgang, Hartmann, Claudio, Hahmann, Martin, Habich, Dirk 18 January 2023 (has links) The forecasting of time series data is an integral component for management, planning, and decision making. Following the Big Data trend, large amounts of time series data are available in many application domains. The highly dynamic and often noisy character of these domains in combination with the logistic problems of collecting data from a large number of data sources, imposes new requirements on the forecasting process. A constantly increasing number of time series has to be forecasted, preferably with low latency AND high accuracy. This is almost impossible, when keeping the traditional focus on creating one forecast model for each individual time series. In addition, often used forecasting approaches like ARIMA need complete historical data to train forecast models and fail if time series are intermittent. A method that addresses all these new requirements is the cross-sectional forecasting approach. It utilizes available data from many time series of the same domain in one single model, thus, missing values can be compensated and accurate forecast results can be calculated quickly. However, this approach is limited by a rigid training data selection and existing forecasting methods show that adaptability of the model to the data increases the forecast accuracy. Therefore, in this paper we present CSAR a model that extends the cross-sectional paradigm by adding more flexibility and allowing fine grained adaptations to the analyzed data. In this way, we achieve an increased forecast accuracy and thus a wider applicability. info:eu-repo/classification/ddc/005 ddc:005
55	Modeling Credit Default Swap Spreads with Transformers : A Thesis in collaboration with Handelsbanken / Modellera Kreditswapp spreadar med Transformers : Ett projekt I samarbete med Handelsbanken Luhr, Johan January 2023 (has links) In the aftermath of the credit crisis in 2007, the importance of Credit Valuation Adjustment (CVA) rose in the Over The Counter (OTC) derivative pricing process. One important part of the pricing process is to determine Probability of Defaults (PDs) of the counterparty in question. The normal way of doing this is to use Credit Default Swap (CDS) spreads from the CDS market. In some cases, there is no associated liquid CDS market, and in those cases, it is market practice to use proxy CDS spreads. In this thesis, transformer models are used to generate proxy CDS spreads with a certain region, rating, and tenor from stand-alone CDS spread data. Two different models are created to do this. The first simpler model is an encoder-based model that uses stand-alone CDS data from a single company to generate one proxy spread per inference. The second, more advanced model is an encoder-decoder model that uses stand-alone CDS data from three companies to generate one proxy spread per inference. The performance of the models is compared, and it is shown that the more advanced model outperforms the simpler model. It should, be noted that the simpler model is faster to train. Both models could be used for data validation. To create the transformer models, it was necessary to implement custom embeddings that embedd specific corporate information and temporal information regarding the CDS spreads. The importance of the different embeddings was also investigated, and it is clear that certain embeddings are more important than others. / I efterdyningarna av kreditkrisen 2007 så ökade betydelsen av CVA vid prissättning av OTC derivat. En viktig del av prissättningen av OTC derivat är att avgöra PDs för den aktuella motparten. Om det finns en likvid CDS marknad för motparten så kan man använda sig av CDSs spreadar dirket från marknaden för att avgöra PDs. I många fall så saknas en sådan likvid CDS marknad. Då är det praksis att istället använda sig av proxy CDS spreadar. I den här uppsatsen så presenteras två transformer modeller för att generera proxy CDS spreadar för bestämda kombinationer av region, rating och löptid från enskilda företags CDS spreadar. Den först enklare modellen är en encoder baserad modell som använder sig av data från ett enskilt företag för att generera en proxy spread per inferens. Den andra modellen är en mer avancerad encoder-decoder modell. Den mer avancerade modellen använder sig av data från tre företag för att generera en proxy spread. I uppsatsen jämförs dessa modeller och man kan konstatera att den mer avancereade modellen genererar mer exakta CDS spreadar. Den enklare modellen är dock betydligt enklare att träna och båda modellerna kan användas i syfte att validera det riktiga proxy datat. För att kunna skapa modellerna så var det en nödvändighet att implementera specialbyggda embeddings som kodad in temporal information och företagsspecifik information om CDS spreadarna. Dessutom så testades vikten av enskilda embeddings och det var uppenbart att vissa embeddings var viktigare än andra. Machine Learning Transformer Finance Credit Default Swap Credit Valuation Adjustment Time Series Data Maskininlärning Transformer Finance Kreditswapp Kredit Värderings Justering Tidsserie data Computer and Information Sciences Data- och informationsvetenskap
56	Restaurant Daily Revenue Prediction : Utilizing Synthetic Time Series Data for Improved Model Performance Jarlöv, Stella, Svensson Dahl, Anton January 2023 (has links) This study aims to enhance the accuracy of a demand forecasting model, XGBoost, by incorporating synthetic multivariate restaurant time series data during the training process. The research addresses the limited availability of training data by generating synthetic data using TimeGAN, a generative adversarial deep neural network tailored for time series data. A one-year daily time series dataset, comprising numerical and categorical features based on a real restaurant's sales history, supplemented by relevant external data, serves as the original data. TimeGAN learns from this dataset to create synthetic data that closely resembles the original data in terms of temporal and distributional dynamics. Statistical and visual analyses demonstrate a strong similarity between the synthetic and original data. To evaluate the usefulness of the synthetic data, an experiment is conducted where varying lengths of synthetic data are iteratively combined with the one-year real dataset. Each iteration involves retraining the XGBoost model and assessing its accuracy for a one-week forecast using the Root Mean Square Error (RMSE). The results indicate that incorporating 6 years of synthetic data improves the model's performance by 65%. The hyperparameter configurations suggest that deeper tree structures benefit the XGBoost model when synthetic data is added. Furthermore, the model exhibits improved feature selection with an increased amount of training data. This study demonstrates that incorporating synthetic data closely resembling the original data can effectively enhance the accuracy of predictive models, particularly when training data is limited. demand forecasting data augmentation time series data machine learning restaurant industry generative adversarial networks TimeGAN XGBoost Computer and Information Sciences Data- och informationsvetenskap
57	Monitoring energy performance in local authority buildings Stuart, Graeme January 2011 (has links) Energy management has been an important function of organisations since the oil crisis of the mid 1970’s led to hugely increased costs of energy. Although the financial costs of energy are still important, the growing recognition of the environmental costs of fossil-fuel energy is becoming more important. Legislation is also a key driver. The UK has set an ambitious greenhouse gas (GHG) reduction target of 80% of 1990 levels by 2050 in response to a strong international commitment to reduce GHG emissions globally. This work is concerned with the management of energy consumption in buildings through the analysis of energy consumption data. Buildings are a key source of emissions with a wide range of energy-consuming equipment, such as photocopiers or refrigerators, boilers, air-conditioning plant and lighting, delivering services to the building occupants. Energy wastage can be identified through an understanding of consumption patterns and in particular, of changes in these patterns over time. Changes in consumption patterns may have any number of causes; a fault in heating controls; a boiler or lighting replacement scheme; or a change in working practice entirely unrelated to energy management. Standard data analysis techniques such as degree-day modelling and CUSUM provide a means to measure and monitor consumption patterns. These techniques were designed for use with monthly billing data. Modern energy metering systems automatically generate data at half-hourly or better resolution. Standard techniques are not designed to capture the detailed information contained in this comparatively high-resolution data. The introduction of automated metering also introduces the need for automated analysis. This work assumes that consumption patterns are generally consistent in the short-term but will inevitably change. A novel statistical method is developed which builds automated event detection into a novel consumption modelling algorithm. Understanding these changes to consumption patterns is critical to energy management. Leicester City Council has provided half-hourly data from over 300 buildings covering up to seven years of consumption (a total of nearly 50 million meter readings). Automatic event detection pinpoints and quantifies over 5,000 statistically significant events in the Leicester dataset. It is shown that the total impact of these events is a decrease in overall consumption. Viewing consumption patterns in this way allows for a new, event-oriented approach to energy management where large datasets are automatically and rapidly analysed to produce summary meta-data describing their salient features. These event-oriented meta-data can be used to navigate the raw data event by event and are highly complementary to strategic energy management.
58	Learning and smoothing in switching Markov models with copulas Zheng, Fei 18 December 2017 (has links) Les modèles de Markov à sauts (appelés JMS pour Jump Markov System) sont utilisés dans de nombreux domaines tels que la poursuite de cibles, le traitement des signaux sismiques et la finance, étant donné leur bonne capacité à modéliser des systèmes non-linéaires et non-gaussiens. De nombreux travaux ont étudié les modèles de Markov linéaires pour lesquels bien souvent la restauration de données est réalisée grâce à des méthodes d’échantillonnage statistique de type Markov Chain Monte-Carlo. Dans cette thèse, nous avons cherché des solutions alternatives aux méthodes MCMC et proposons deux originalités principales. La première a consisté à proposer un algorithme de restauration non supervisée d’un JMS particulier appelé « modèle de Markov couple à sauts conditionnellement gaussiens » (noté CGPMSM). Cet algorithme combine une méthode d’estimation des paramètres basée sur le principe Espérance-Maximisation (EM) et une méthode efficace pour lisser les données à partir des paramètres estimés. La deuxième originalité a consisté à étendre un CGPMSM spécifique appelé CGOMSM par l’introduction des copules. Ce modèle, appelé GCOMSM, permet de considérer des distributions plus générales que les distributions gaussiennes tout en conservant des méthodes de restauration optimales et rapides. Nous avons équipé ce modèle d’une méthode d’estimation des paramètres appelée GICE-LS, combinant le principe de la méthode d’estimation conditionnelle itérative généralisée et le principe des moindre-carrés linéaires. Toutes les méthodes sont évaluées sur des données simulées. En particulier, les performances de GCOMSM sont discutées au regard de modèles de Markov non-linéaires et non-gaussiens tels que la volatilité stochastique, très utilisée dans le domaine de la finance. / Switching Markov Models, also called Jump Markov Systems (JMS), are widely used in many fields such as target tracking, seismic signal processing and finance, since they can approach non-Gaussian non-linear systems. A considerable amount of related work studies linear JMS in which data restoration is achieved by Markov Chain Monte-Carlo (MCMC) methods. In this dissertation, we try to find alternative restoration solution for JMS to MCMC methods. The main contribution of our work includes two parts. Firstly, an algorithm of unsupervised restoration for a recent linear JMS known as Conditionally Gaussian Pairwise Markov Switching Model (CGPMSM) is proposed. This algorithm combines a parameter estimation method named Double EM, which is based on the Expectation-Maximization (EM) principle applied twice sequentially, and an efficient approach for smoothing with estimated parameters. Secondly, we extend a specific sub-model of CGPMSM known as Conditionally Gaussian Observed Markov Switching Model (CGOMSM) to a more general one, named Generalized Conditionally Observed Markov Switching Model (GCOMSM) by introducing copulas. Comparing to CGOMSM, the proposed GCOMSM adopts inherently more flexible distributions and non-linear structures, while optimal restoration is feasible. In addition, an identification method called GICE-LS based on the Generalized Iterative Conditional Estimation (GICE) and the Least-Square (LS) principles is proposed for GCOMSM to approximate any non-Gaussian non-linear systems from their sample data set. All proposed methods are tested by simulation. Moreover, the performance of GCOMSM is discussed by application on other generable non-Gaussian non-linear Markov models, for example, on stochastic volatility models which are of great importance in finance. Modèles de Markov à sauts Chaîne de Markov triplet Identification de modèles Algorithme espérance-maximisation Switching Markov models Non-Gaussian non-linear Markov system Triplet Markov chain Model identification Optimal time series data restoration Expectation-Maximization
59	過濾靴帶反覆抽樣與一般動差估計式 / Sieve Bootstrap Inference Based on GMM Estimators of Time Series Data 劉祝安, Liu, Chu-An Unknown Date (has links) In this paper, we propose two types of sieve bootstrap, univariate and multivariate approach, for the generalized method of moments estimators of time series data. Compared with the nonparametric block bootstrap, the sieve bootstrap is in essence parametric, which helps fitting data better when researchers have prior information about the time series properties of the variables of interested. Our Monte Carlo experiments show that the performances of these two types of sieve bootstrap are comparable to the performance of the block bootstrap. Furthermore, unlike the block bootstrap, which is sensitive to the choice of block length, these two types of sieve bootstrap are less sensitive to the choice of lag length. 過濾靴帶反覆抽樣法區塊拔靴法一般動差估計式時間序列資料 Sieve bootstrap block bootstrap GMM estimators time series data
60	Combinaison de l’Internet des objets, du traitement d’évènements complexes et de la classification de séries temporelles pour une gestion proactive de processus métier / Combining the Internet of things, complex event processing, and time series classification for a proactive business process management. Mousheimish, Raef 27 October 2017 (has links) L’internet des objets est au coeur desprocessus industriels intelligents grâce à lacapacité de détection d’évènements à partir dedonnées de capteurs. Cependant, beaucoup resteà faire pour tirer le meilleur parti de cettetechnologie récente et la faire passer à l’échelle.Cette thèse vise à combler le gap entre les fluxmassifs de données collectées par les capteurs etleur exploitation effective dans la gestion desprocessus métier. Elle propose une approcheglobale qui combine le traitement de flux dedonnées, l’apprentissage supervisé et/oul’utilisation de règles sur des évènementscomplexes permettant de prédire (et doncéviter) des évènements indésirables, et enfin lagestion des processus métier étendue par cesrègles complexes.Les contributions scientifiques de cette thèse sesituent dans différents domaines : les processusmétiers plus intelligents et dynamiques; letraitement d’évènements complexes automatisépar l’apprentissage de règles; et enfin et surtout,dans le domaine de la fouille de données deséries temporelles multivariéespar la prédiction précoce de risques.L’application cible de cette thèse est le transportinstrumenté d’oeuvres d’art / Internet of things is at the core ofsmart industrial processes thanks to its capacityof event detection from data conveyed bysensors. However, much remains to be done tomake the most out of this recent technologyand make it scale. This thesis aims at filling thegap between the massive data flow collected bysensors and their effective exploitation inbusiness process management. It proposes aglobal approach, which combines stream dataprocessing, supervised learning and/or use ofcomplex event processing rules allowing topredict (and thereby avoid) undesirable events,and finally business process managementextended to these complex rules. The scientificcontributions of this thesis lie in several topics:making the business process more intelligentand more dynamic; automation of complexevent processing by learning the rules; and lastand not least, in datamining for multivariatetime series by early prediction of risks. Thetarget application of this thesis is theinstrumented transportation of artworks. Traitement des événements complexes Fouille de séries temporelles Lassification précoce Séries temporelles multivariées Gestion des processus métiers Complex Event Processing Time Series Data Mining Early Classification Multivariate Time Series Business Process Management

Search results