Spelling suggestions: "subject:"time series data"" "subject:"time series mata""
11 |
Management of Time Series DataMatus Castillejos, Abel, n/a January 2006 (has links)
Every day large volumes of data are collected in the form of time series. Time series are
collections of events or observations, predominantly numeric in nature, sequentially recorded
on a regular or irregular time basis. Time series are becoming increasingly important in
nearly every organisation and industry, including banking, finance, telecommunication, and
transportation. Banking institutions, for instance, rely on the analysis of time series for
forecasting economic indices, elaborating financial market models, and registering
international trade operations. More and more time series are being used in this type of
investigation and becoming a valuable resource in today�s organisations.
This thesis investigates and proposes solutions to some current and important issues in time
series data management (TSDM), using Design Science Research Methodology. The thesis
presents new models for mapping time series data to relational databases which optimise the
use of disk space, can handle different time granularities, status attributes, and facilitate time
series data manipulation in a commercial Relational Database Management System
(RDBMS). These new models provide a good solution for current time series database
applications with RDBMS and are tested with a case study and prototype with financial time
series information. Also included is a temporal data model for illustrating time series data
lifetime behaviour based on a new set of time dimensions (confidentiality, definitiveness,
validity, and maturity times) specially targeted to manage time series data which are
introduced to correctly represent the different status of time series data in a timeline. The
proposed temporal data model gives a clear and accurate picture of the time series data
lifecycle. Formal definitions of these time series dimensions are also presented. In addition,
a time series grouping mechanism in an extensible commercial relational database system is
defined, illustrated, and justified. The extension consists of a new data type and its
corresponding rich set of routines that support modelling and operating time series
information within a higher level of abstraction. It extends the capability of the database
server to organise and manipulate time series into groups. Thus, this thesis presents a new
data type that is referred to as GroupTimeSeries, and its corresponding architecture and
support functions and operations. Implementation options for the GroupTimeSeries data type
in relational based technologies are also presented.
Finally, a framework for TSDM with enough expressiveness of the main requirements of time
series application and the management of that data is defined. The framework aims at
providing initial domain know-how and requirements of time series data management,
avoiding the impracticability of designing a TSDM system on paper from scratch. Many
aspects of time series applications including the way time series data are organised at the
conceptual level are addressed. The central abstraction for the proposed domain specific
framework is the notions of business sections, group of time series, and time series itself. The
framework integrates comprehensive specification regarding structural and functional aspects
for time series data management. A formal framework specification using conceptual graphs
is also explored.
|
12 |
Efterfrågan på beroendeframkallande varor : En studie om hur efterfrågan på snusprodukter har reagerat på prisökningar i Sverige mellan 1999-2009Buchheim, Viktor January 2012 (has links)
Denna uppsats behandlar de relativa prisökningar som skett för snusprodukter i Sverige och vill undersöka om dessa har lett till minskad efterfrågan som nationalekonomisk teori föreslår. Utifrån teori och tidigare forskning har en efterfrågemodell konstruerats för att möjliggöra en statistik undersökning. Variablerna som ingår i modellen är inhämtade från Statistiska centralbyråns prisenhet och Swedish Match AB och inkluderar prisuppgifter för varor, försäljningsstatistik och disponibel inkomst under tidsperioden 1999-2009. Resultaten från regressionsanalyser för tidsseriedata visar på att de ökade priserna har haft en negativ inverkan på efterfrågan på snus under den gällande tidsperioden, men att denna effekt varit förhållandevis liten.
|
13 |
The Application Of Disaggregation Methods To The Unemployment Rate Of TurkeyTuker, Utku Goksel 01 September 2010 (has links) (PDF)
Modeling and forecasting of the unemployment rate of a country is very important to be able to take precautions on the governmental policies. The available unemployment rate data of Turkey provided by the Turkish Statistical Institute (TURKSTAT) are not in suitable format to have a time series model. The unemployment rate data between 1988 and 2009 create a problem of building a reliable time series model due to the insufficient number and irregular form of observations. The application of disaggregation methods to some parts of the unemployment rate data enables us to fit an appropriate time series model and to have forecasts as a result of the suggested model.
|
14 |
Multiple Imputation on Missing Values in Time Series DataOh, Sohae January 2015 (has links)
<p>Financial stock market data, for various reasons, frequently contain missing values. One reason for this is that, because the markets close for holidays, daily stock prices are not always observed. This creates gaps in information, making it difficult to predict the following day’s stock prices. In this situation, information during the holiday can be “borrowed” from other countries’ stock market, since global stock prices tend to show similar movements and are in fact highly correlated. The main goal of this study is to combine stock index data from various markets around the world and develop an algorithm to impute the missing values in individual stock index using “information-sharing” between different time series. To develop imputation algorithm that accommodate time series-specific features, we take multiple imputation approach using dynamic linear model for time-series and panel data. This algorithm assumes ignorable missing data mechanism, as which missingness due to holiday. The posterior distributions of parameters, including missing values, is simulated using Monte Carlo Markov Chain (MCMC) methods and estimates from sets of draws are then combined using Rubin’s combination rule, rendering final inference of the data set. Specifically, we use the Gibbs sampler and Forward Filtering and Backward Sampling (FFBS) to simulate joint posterior distribution and posterior predictive distribution of latent variables and other parameters. A simulation study is conducted to check the validity and the performance of the algorithm using two error-based measurements: Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE). We compared the overall trend of imputed time series with complete data set, and inspected the in-sample predictability of the algorithm using Last Value Carried Forward (LVCF) method as a bench mark. The algorithm is applied to real stock price index data from US, Japan, Hong Kong, UK and Germany. From both of the simulation and the application, we concluded that the imputation algorithm performs well enough to achieve our original goal, predicting the stock price for the opening price after a holiday, outperforming the benchmark method. We believe this multiple imputation algorithm can be used in many applications that deal with time series with missing values such as financial and economic data and biomedical data.</p> / Thesis
|
15 |
Γραμμικά μοντέλα χρονοσειρών και αυτοσυσχέτισηςΓαζή, Σταυρούλα 07 July 2015 (has links)
Ο σκοπός αυτής της μεταπτυχιακής εργασίας είναι διπλός και συγκεκριμένα αφορά στη μελέτη του απλού / γενικευμένου πολλαπλού μοντέλου παλινδρόμησης όταν σε αυτό παραβιάζεται μια από τις συνθήκες των Gauss-Markov και πιο συγκεκριμένα όταν, Cov{ε_i,ε_j }≠0, ∀ i≠j και στην ανάλυση χρονοσειρών. Αρχικά, γίνεται συνοπτική αναφορά στο απλό και στο πολλαπλό γραμμικό μοντέλο παλινδρόμησης, στις ιδιότητες καθώς και στις εκτιμήσεις των συντελεστών παλινδρόμησης. Περιγράφονται οι ιδιότητες των τυχαίων όρων όπως μέση τιμή, διασπορά, συντελεστές συσχέτισης κ.α., εφόσον υπάρχει παραβίαση της ιδιότητας της συνδιασποράς αυτών. Τέλος, περιγράφεται ο έλεγχος για αυτοσυσχέτιση των τυχαίων όρων των Durbin-Watson καθώς και μια ποικιλία διορθωτικών μέτρων με σκοπό την εξάλειψή της.
Στο δεύτερο μέρος, αρχικά αναφέρονται βασικές έννοιες της θεωρίας των χρονοσειρών. Στη συνέχεια, γίνεται ανάλυση διαφόρων στάσιμων χρονοσειρών και συγκεκριμένα, ξεκινώντας από το λευκό θόρυβο, παρουσιάζονται οι χρονοσειρές κινητού μέσου (ΜΑ), οι αυτοπαλινδρομικές χρονοσειρές (ΑR), οι χρονοσειρές ARMA, καθώς και η γενική περίπτωση μη στάσιμων χρονοσειρών, των ΑRΙΜΑ χρονοσειρών και παρατίθενται συνοπτικά τα πρώτα στάδια ανάλυσης μιας χρονοσειράς για κάθε μια από τις περιπτώσεις αυτές.
Η εργασία αυτή βασίστηκε σε δύο σημαντικά βιβλία διακεκριμένων επιστημόνων, του κ. Γεώργιου Κ. Χρήστου, Εισαγωγή στην Οικονομετρία και στο βιβλίο των John Neter, Michael H. Kutner, Christofer J. Nachtsheim και William Wasserman, Applied Linear Regression Models. / The purpose of this thesis is twofold, namely concerns the study of the simple / generalized multiple regression model when this violated one of the conditions of Gauss-Markov specifically when, Cov {e_i, e_j} ≠ 0, ∀ i ≠ j and time series analysis. Initially, there is a brief reference to the simple and multiple linear regression model, the properties and estimates of regression coefficients. Describe the properties of random terms such as mean, variance, correlation coefficients, etc., if there is a breach of the status of their covariance. Finally, described the test for autocorrelation of random terms of the Durbin-Watson and a variety of corrective measures to eliminate it.
In the second part, first mentioned basic concepts of the theory of time series. Then, various stationary time series analyzes and specifically, starting from the white noise, the time series moving average presented (MA), the aftopalindromikes time series (AR) time series ARMA, and the general case of non-stationary time series of ARIMA time series and briefly presents the first analysis steps in a time series for each of these cases.
This work was based on two important books of distinguished scientists, Mr. George K. Christou, Introduction to Econometrics, and in the book of John Neter, Michael H. Kutner, Christofer J. Nachtsheim and William Wasserman, Applied Linear Regression Models.
|
16 |
An Application of Autoregressive Conditional Heteroskedasticity (Arch) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Modelling on Taiwan's Time-Series Data: Three EssaysChang, Tsangyao 01 January 1995 (has links)
In this dissertation, three essays are presented that apply recent advances in time-series methods to the analysis of inflation and stock market index data for Taiwan. Specifically, ARCH and GARCH methodologies are used to investigate claims of increased volatility in economic time-series data since 1980.
In the first essay, analysis that accounts for structural change reveals that the fundamental relationship between inflation and its variability was severed by policies implemented during economic liberalization in Taiwan in the early 1980s. Furthermore, if residuals are corrected for serial correlation, evidence in favor of ARCH effects is weakened. In the second essay, dynamic linkages between daily stock returns and daily trading volume are explored. Both linear and nonlinear dependence are evaluated using Granger causality tests and GARCH modelling. Results suggest significant unidirectional Granger causality from stock returns to trading volume. In the third essay, comparative analysis of the frequency structure of the Taiwan stock index data is conducted using daily, weekly, and monthly data. Results demonstrate that the relationship between mean return and its conditional standard deviation is positive and significant only for high-frequency daily data.
|
17 |
Model selection in time series machine learning applicationsFerreira, E. (Eija) 01 September 2015 (has links)
Abstract
Model selection is a necessary step for any practical modeling task. Since the true model behind a real-world process cannot be known, the goal of model selection is to find the best approximation among a set of candidate models.
In this thesis, we discuss model selection in the context of time series machine learning applications. We cover four steps of the commonly followed machine learning process: data preparation, algorithm choice, feature selection and validation. We consider how the characteristics and the amount of data available should guide the selection of algorithms to be used, and how the data set at hand should be divided for model training, selection and validation to optimize the generalizability and future performance of the model. We also consider what are the special restrictions and requirements that need to be taken into account when applying regular machine learning algorithms to time series data. We especially aim to bring forth problems relating model over-fitting and over-selection that might occur due to careless or uninformed application of model selection methods.
We present our results in three different time series machine learning application areas: resistance spot welding, exercise energy expenditure estimation and cognitive load modeling. Based on our findings in these studies, we draw general guidelines on which points to consider when starting to solve a new machine learning problem from the point of view of data characteristics, amount of data, computational resources and possible time series nature of the problem. We also discuss how the practical aspects and requirements set by the environment where the final model will be implemented affect the choice of algorithms to use. / Tiivistelmä
Mallinvalinta on oleellinen osa minkä tahansa käytännön mallinnusongelman ratkaisua. Koska mallinnettavan ilmiön toiminnan taustalla olevaa todellista mallia ei voida tietää, on mallinvalinnan tarkoituksena valita malliehdokkaiden joukosta sitä lähimpänä oleva malli.
Tässä väitöskirjassa käsitellään mallinvalintaa aikasarjamuotoista dataa sisältävissä sovelluksissa neljän koneoppimisprosessissa yleisesti noudatetun askeleen kautta: aineiston esikäsittely, algoritmin valinta, piirteiden valinta ja validointi. Väitöskirjassa tutkitaan, kuinka käytettävissä olevan aineiston ominaisuudet ja määrä tulisi ottaa huomioon algoritmin valinnassa, ja kuinka aineisto tulisi jakaa mallin opetusta, testausta ja validointia varten mallin yleistettävyyden ja tulevan suorituskyvyn optimoimiseksi. Myös erityisiä rajoitteita ja vaatimuksia tavanomaisten koneoppimismenetelmien soveltamiselle aikasarjadataan käsitellään. Työn tavoitteena on erityisesti tuoda esille mallin ylioppimiseen ja ylivalintaan liittyviä ongelmia, jotka voivat seurata mallinvalin- tamenetelmien huolimattomasta tai osaamattomasta käytöstä.
Työn käytännön tulokset perustuvat koneoppimismenetelmien soveltamiseen aikasar- jadatan mallinnukseen kolmella eri tutkimusalueella: pistehitsaus, fyysisen harjoittelun aikasen energiankulutuksen arviointi sekä kognitiivisen kuormituksen mallintaminen. Väitöskirja tarjoaa näihin tuloksiin pohjautuen yleisiä suuntaviivoja, joita voidaan käyttää apuna lähdettäessä ratkaisemaan uutta koneoppimisongelmaa erityisesti aineiston ominaisuuksien ja määrän, laskennallisten resurssien sekä ongelman mahdollisen aikasar- jaluonteen näkökulmasta. Työssä pohditaan myös mallin lopullisen toimintaympäristön asettamien käytännön näkökohtien ja rajoitteiden vaikutusta algoritmin valintaan.
|
18 |
NoSQL databáze pro data senzorů s podporou časových řad / NoSQL Time Series Database for Sensor DataVizina, Petr January 2017 (has links)
This thesis deals with NoSQL databases, which can be used for effective storage of sensors data with character of time series. The aim is to design and implement own solution for database designed to store time series data, with usage of NoSQL.
|
19 |
Exploring Change Point Detection in Network Equipment LogsBjörk, Tim January 2021 (has links)
Change point detection (CPD) is the method of detecting sudden changes in timeseries, and its importance is great concerning network traffic. With increased knowledge of occurring changes in data logs due to updates in networking equipment,a deeper understanding is allowed for interactions between the updates and theoperational resource usage. In a data log that reflects the amount of network traffic, there are large variations in the time series because of reasons such as connectioncount or external changes to the system. To circumvent these unwanted variationchanges and assort the deliberate variation changes is a challenge. In this thesis, we utilize data logs retrieved from a network equipment vendor to detect changes, then compare the detected changes to when firmware/signature updates were applied, configuration changes were made, etc. with the goal to achieve a deeper understanding of any interaction between firmware/signature/configuration changes and operational resource usage. Challenges in the data quality and data processing are addressed through data manipulation to counteract anomalies and unwanted variation, as well as experimentation with parameters to achieve the most ideal settings. Results are produced through experiments to test the accuracy of the various change pointdetection methods, and for investigation of various parameter settings. Through trial and error, a satisfactory configuration is achieved and used in large scale log detection experiments. The results from the experiments conclude that additional information about how changes in variation arises is required to derive the desired understanding.
|
20 |
Organizing HLA data for improved navigation and searchabilitySöderbäck, Karl January 2021 (has links)
Pitch Technologies specializes their work on the HLA standard, a standard that specifies data exchange between simulators. The company provides a solution for recording HLA data into a database as raw byte data entries. In this thesis, different design solutions to store and organize recorded HLA data in a manner that reflects the content of the data are proposed and implemented, with the aim of making the data possible to query and analyze after recording. The design solutions impact on storage, read- and write performance as well as usability are evaluated through a suite of tests run on a PostgreSQL database and a TimescaleDB database. It is concluded that none of the design alternatives is the best solution for all aspects, but the most promising combination is proposed.
|
Page generated in 0.1246 seconds