Global ETD Search

551	A sliding window BIRCH algorithm with performance evaluations Li, Chuhe January 2017 (has links) An increasing number of applications covered various fields generate transactional data or other time-stamped data which all belongs to time series data. Time series data mining is a popular topic in the data mining field, it introduces some challenges to improve accuracy and efficiency of algorithms for time series data. Time series data are dynamical, large-scale and high complexity, which makes it difficult to discover patterns among time series data with common methods suitable for static data. One of hierarchical-based clustering methods called BIRCH was proposed and employed for addressing the problems of large datasets. It minimizes the costs of I/O and time. A CF tree is generated during its working process and clusters are generated after four phases of the whole BIRCH procedure. A drawback of BIRCH is that it is not very scalable. This thesis is devoted to improve accuracy and efficiency of BIRCH algorithm. A sliding window BIRCH algorithm is implemented on the basis of BIRCH algorithm. At the end of thesis, the accuracy and efficiency of sliding window BIRCH are evaluated. A performance comparison among SW BIRCH, BIRCH and K-means are also presented with Silhouette Coefficient index and Calinski-Harabaz Index. The preliminary results indicate that the SW BIRCH may achieve a better performance than BIRCH in some cases. Clustering time series data Computer Systems Datorsystem
552	Price forecasting models in online flower shop implementation Lu, Zhen Cang January 2017 (has links) University of Macau / Faculty of Science and Technology / Department of Computer and Information Science Economics, Mathematical Autoregression (Statistics) Time-series analysis
553	Inaccuracies in the Second Half of Season Five of the Medical Drama, House, MD. Aragon, Bernadette, Luiten, Erica, Apgar, David January 2012 (has links) Class of 2012 Abstract / Specific Aims: To assess the accuracy of the presenting signs and symptoms, diagnostic procedures, and treatments presented in the last twelve episodes of season five of the popular medical drama, House, MD. Methods: A descriptive retrospective evaluation of the accuracy and inaccuracies of episodes 13 to 24 in season five of the television series House, MD. The accuracy of the presenting signs and symptoms, diagnostic procedures, and treatment in each episode was rated on a scale of one to four. A rating of one described a correct and usual representation. A rating of two described a correct but somewhat unusual representation. A rating of three described a correct but extremely unusual representation. A rating of four described an incorrect representation. Each researcher independently rated the episodes, and then a collaborative rating was agreed upon by both researchers. Main Results: Results of the ANOVA test demonstrated a statistically significant difference between the three dependent variables (p=0.002). The Tukey HSD post-hoc test confirmed a significant difference between the accuracy of treatment when compared with signs and symptoms (p=0.012), and with diagnostic procedures (p=0.002). The average rating for the treatment variable was 1.58 (0.9), whereas the average ratings for the signs and symptoms and diagnosis variables were 2.75 ( 0.754), and 3 (1.128), respectively. Conclusions: The treatments presented in the last twelve episode of season five of House, MD were more accurate than both the presenting signs and symptoms and the diagnosis. House, MD television series Season Five Inaccuracies
554	Stationary multivaria time series analysis Malan, Karien 13 June 2008 (has links) Multivariate time series analysis became popular in the early 1950s when the need to analyse time series simultaneously arose in the field of economics. This study provides an overview of some of the aspects of multivariate time series analysis in the case of stationarity. The VARMA (vector autoregressive moving average) class of multivariate time series models, including pure vector autoregressive (VAR) and vector moving average (VMA) models is considered. Methods based on moments and information criteria for the determination of the appropriate order of a model suitable for an observed multivariate time series are discussed. Feasible methods of estimation based on the least squares and/or maximum likelihood are provided for the different types of VARMA models. In some cases, the estimation is more complicated due to the identification problem and the nonlinearity of the normal equations. It is shown that the significance of individual estimates can be established by using hypothesis tests based on the asymptotic properties of the estimators. Diagnostic tests for the adequacy of the fitted model are discussed and illustrated. These include methods based on both univariate and multivariate procedures. The complete model building process is illustrated by means of case studies on multivariate electricity demand and temperature time series. Throughout the study numerical examples are used to illustrate concepts. Computer program code (using basic built-in multivariate functions) is given for all the examples. The results are benchmarked against those produced by a dedicated procedure for multivariate time series. It is envisaged that the program code (given in SAS/IML) could be made available to a much wider user community, without much difficulty, by translation into open source platforms. / Dissertation (MSc (Mathematical Statistics))--University of Pretoria, 2008. / Mathematics and Applied Mathematics / unrestricted Multivaria time series Stationary Economics UCTD
555	Fault location on series compensated transmission lines Padmanabhan, Shantanu January 2015 (has links) Fault location for series compensated lines cannot be addressed sufficiently by conventional solutions developed for traditional uncompensated lines. Line-parameters vary with loading and weather conditions, and therefore settings used for fault location are often erroneous. Line-parameter free solutions for fault location are therefore more reliable and accurate than conventional solutions that require such settings. Hence, line-parameter free fault location algorithms for single-circuit and double-circuit series compensated transmission lines were developed during the research project. Single-circuit lines and double-circuit lines both present unique challenges for fault location. They also vary in the number of available measurements that can be used to arrive at a solution for distance to fault. A third algorithm is presented that allows the extension of existing short line algorithms to the case of long lines. This is done by providing a method for incorporating the line shunt admittance into these existing algorithms. The aforementioned three bodies of research work, form the focus of this thesis. The algorithms are derived using two-terminal synchronised current and voltage sampled measurements. Of these, the algorithms for series compensated lines are also derived for asynchronous measurements. Phasors are obtained by carrying out a Fast Fourier Transform, and then appropriate calculations are performed for distance to fault. The thesis covers the mathematical derivations of the algorithms, involving the algebraic reduction of non-linear equations in numerous variables into a single expression for distance to fault. The results for a variety of simulation tests are shown subsequently and discussed. Various fault resistances, fault types, degrees of series compensation, line lengths, fault levels are considered in the tests carried out. The algorithms are largely found to be highly accurate under these various conditions, and where the algorithms perform to a lesser degree of accuracy are highlighted and discussed. Lastly, a detailed chapter discussing future work is also included in the thesis. 621.319
556	HaMMLeT: An Infinite Hidden Markov Model with Local Transitions Dawson, Colin Reimer, Dawson, Colin Reimer January 2017 (has links) In classical mixture modeling, each data point is modeled as arising i.i.d. (typically) from a weighted sum of probability distributions. When data arises from different sources that may not give rise to the same mixture distribution, a hierarchical model can allow the source contexts (e.g., documents, sub-populations) to share components while assigning different weights across them (while perhaps coupling the weights to "borrow strength" across contexts). The Dirichlet Process (DP) Mixture Model (e.g., Rasmussen (2000)) is a Bayesian approach to mixture modeling which models the data as arising from a countably infinite number of components: the Dirichlet Process provides a prior on the mixture weights that guards against overfitting. The Hierarchical Dirichlet Process (HDP) Mixture Model (Teh et al., 2006) employs a separate DP Mixture Model for each context, but couples the weights across contexts. This coupling is critical to ensure that mixture components are reused across contexts. An important application of HDPs is to time series models, in particular Hidden Markov Models (HMMs), where the HDP can be used as a prior on a doubly infinite transition matrix for the latent Markov chain, giving rise to the HDP-HMM (first developed, as the "Infinite HMM", by Beal et al. (2001), and subsequently shown to be a case of an HDP by Teh et al. (2006)). There, the hierarchy is over rows of the transition matrix, and the distributions across rows are coupled through a top-level Dirichlet Process. In the first part of the dissertation, I present a formal overview of Mixture Models and Hidden Markov Models. I then turn to a discussion of Dirichlet Processes and their various representations, as well as associated schemes for tackling the problem of doing approximate inference over an infinitely flexible model with finite computa- tional resources. I will then turn to the Hierarchical Dirichlet Process (HDP) and its application to an infinite state Hidden Markov Model, the HDP-HMM. These models have been widely adopted in Bayesian statistics and machine learning. However, a limitation of the vanilla HDP is that it offers no mechanism to model correlations between mixture components across contexts. This is limiting in many applications, including topic modeling, where we expect certain components to occur or not occur together. In the HMM setting, we might expect certain states to exhibit similar incoming and outgoing transition probabilities; that is, for certain rows and columns of the transition matrix to be correlated. In particular, we might expect pairs of states that are "similar" in some way to transition frequently to each other. The HDP-HMM offers no mechanism to model this similarity structure. The central contribution of the dissertation is a novel generalization of the HDP- HMM which I call the Hierarchical Dirichlet Process Hidden Markov Model With Local Transitions (HDP-HMM-LT, or HaMMLeT for short), which allows for correlations between rows and columns of the transition matrix by assigning each state a location in a latent similarity space and promoting transitions between states that are near each other. I present a Gibbs sampling scheme for inference in this model, employing auxiliary variables to simplify the relevant conditional distributions, which have a natural interpretation after re-casting the discrete time Markov chain as a continuous time Markov Jump Process where holding times are integrated out, and where some jump attempts "fail". I refer to this novel representation as the Markov Process With Failed Jumps. I test this model on several synthetic and real data sets, showing that for data where transitions between similar states are more common, the HaMMLeT model more effectively finds the latent time series structure underlying the observations. Bayesian statistics Machine learning Time series modeling
557	Statistical Analysis of Meteorological Data Perez Melo, Sergio 01 January 2014 (has links) Some of the more significant effects of global warming are manifested in the rise of temperatures and the increased intensity of hurricanes. This study analyzed data on Annual, January and July temperatures in Miami in the period spanning from 1949 to 2011; as well as data on central pressure and radii of maximum winds of hurricanes from 1944 to present. Annual Average, Maximum and Minimum Temperatures were found to be increasing with time. Also July Average, Maximum and Minimum Temperatures were found to be increasing with time. On the other hand, no significant trend could be detected for January Average, Maximum and Minimum Temperatures. No significant trend was detected in the central pressures and radii of maximum winds of hurricanes, while the radii of maximum winds for the largest hurricane of the year showed an increasing trend. Global Warming Temperatures Hurricanes Time Series Analysis
558	Statistical analysis with the state space model Chu-Chun-Lin, Singfat 05 1900 (has links) The State Space Model (SSM) encompasses the class of multivariate linear models, in particular, regression models with fixed, time-varying and random parameters, time series models, unobserved components models and combinations thereof. The well-known Kalman Filter (KF) provides a unifying tool for conducting statistical inferences with the SSM. A major practical problem with the KF concerns its initialization when either the initial state or the regression parameter (or both) in the SSM are diffuse. In these situations, it is common practice to either apply the KF to a transformation of the data which is functionally independent of the diffuse parameters or else initialize the KF with an arbitrarily large error covariance matrix. However neither approach is entirely satisfactory. The data transformation required in the first approach can be computationally tedious and furthermore it may not preserve the state space structure. The second approach is theoretically and numerically unsound. Recently however, De Jong (1991) has developed an extension of the KF, called the Diffuse Kalman Filter (DKF) to handle these diffuse situations. The DKF does not require any data transformation. The thesis contributes further to the theoretical and computational aspects of con ducting statistical inferences using the DKF. First, we demonstrate the appropriate initialization of the DKF for the important class of time-invariant SSM’s. This result is useful for maximum likelihood statistical inference with the SSM. Second, we derive and compare alternative pseudo-likelihoods for the diffuse SSM. We uncover some interesting characteristics of the DKF and the diffuse likelihood with the class of ARMA models. Third, we propose an efficient implementation of the DKF, labelled the collapsed DKF (CDKF). The latter is derived upon sweeping out some columns of the pertinent matrices in the DKF after an initial number of iterations. The CDKF coincides with the KF in the absence of regression effects in the SSM. We demonstrate that in general the CDKF is superior in practicality and performance to alternative algorithms proposed in the literature. Fourth, we consider maximum likelihood estimation in the SSM using an EM (Expectation-Maximization) approach. Through a judicious choice of the complete data, we develop an CDKF-EM algorithm which does not require the evaluation of lag one state error covariance matrices for the most common estimation exercise required for the SSM, namely the estimation of the covariance matrices of the disturbances in the SSM. Last we explore the topic of diagnostic testing in the SSM. We discuss and illustrate the recursive generation of residuals and the usefulness of the latters in pinpointing likely outliers and points of structural change. / Business, Sauder School of / Graduate State - space methods Time - series analysis
559	Simulation of domestic water re-use systems : greywater and rainwater in combination Dixon, Andrew Martin January 2000 (has links) No description available. 628.168
560	Data mining časových řad / Time series data mining Novák, Petr January 2009 (has links) This work deals with modern trends in time series data mining web mining; time series; data mining

Search results