• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 10
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 75
  • 75
  • 75
  • 43
  • 43
  • 31
  • 25
  • 20
  • 16
  • 16
  • 14
  • 13
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Variable screening and graphical modeling for ultra-high dimensional longitudinal data

Zhang, Yafei 02 July 2019 (has links)
Ultrahigh-dimensional variable selection is of great importance in the statistical research. And independence screening is a powerful tool to select important variable when there are massive variables. Some commonly used independence screening procedures are based on single replicate data and are not applicable to longitudinal data. This motivates us to propose a new Sure Independence Screening (SIS) procedure to bring the dimension from ultra-high down to a relatively large scale which is similar to or smaller than the sample size. In chapter 2, we provide two types of SIS, and their iterative extensions (iterative SIS) to enhance the finite sample performance. An upper bound on the number of variables to be included is derived and assumptions are given under which sure screening is applicable. The proposed procedures are assessed by simulations and an application of them to a study on systemic lupus erythematosus illustrates the practical use of these procedures. After the variables screening process, we then explore the relationship among the variables. Graphical models are commonly used to explore the association network for a set of variables, which could be genes or other objects under study. However, graphical modes currently used are only designed for single replicate data, rather than longitudinal data. In chapter 3, we propose a penalized likelihood approach to identify the edges in a conditional independence graph for longitudinal data. We used pairwise coordinate descent combined with second order cone programming to optimize the penalized likelihood and estimate the parameters. Furthermore, we extended the nodewise regression method the for longitudinal data case. Simulation and real data analysis exhibit the competitive performance of the penalized likelihood method. / Doctor of Philosophy / Longitudinal data have received a considerable amount of attention in the fields of health science studies. The information from this type of data could be helpful with disease detection and control. Besides, a graph of factors related to the disease can also be built up to represent their relationships between each other. In this dissertation, we develop a framework to find out important factor(s) from thousands of factors in longitudinal data that is/are related to the disease. In addition, we develop a graphical method that can show the relationship among the important factors identified from the previous screening. In practice, combining these two methods together can identify important factors for a disease as well as the relationship among the factors, and thus provide us a deeper understanding about the disease.
2

Longitudinal analysis on AQI in 3 main economic zones of China

Wu, Kailin 09 October 2014 (has links)
In modern China, air pollution has become an essential environmental problem. Over the last 2 years, the air pollution problem, as measured by PM 2.5 (particulate matter) is getting worse. My report aims to carry out a longitudinal data analysis of the air quality index (AQI) in 3 main economic zones in China. Longitudinal data, or repeated measures data, can be viewed as multilevel data with repeated measurements nested within individuals. I arrive at some conclusions about why the 3 areas have different AQI, mainly attributed to factors like population, GDP, temperature, humidity, and other factors like whether the area is inland or by the sea. The residual variance is partitioned into a between-zone component (the variance of the zone-level residuals) and a within-zone component (the variance of the city-level residuals). The zone residuals represent unobserved zone characteristics that affect AQI. In this report, the model building is mainly according to the sequence described by West et al (2007) with respect to the bottom-up procedures and the reference by Singer, J. D., & Willett, J. B (2003) which includes the non-linear situations. This report also compares the quartic curve model with piecewise growth model with respect to this data. The final model I reached is a piece wise model with time-level and zone-level predictors and also with temperature by time interactions. / text
3

A review of "longitudinal study" in developmental psychology

Finley, Emily H. 01 January 1972 (has links)
The purpose of this library research thesis is to review the "longitudinal study" in terms of problems and present use. A preliminary search of the literature on longitudinal method revealed problems centering around two areas: (1) definition of "longitudinal study" and (2) practical problems of method itself. The purpose of this thesis then is to explore through a search of books and journals the following questions: 1. How can “longitudinal study” be defined? 2. What problems are inherent in the study of the same individuals over time and how can these problems be solved? A third question which emerges from these two is: 3. How is “longitudinal study” being used today? This thesis differentiates traditional longitudinal study from other methods of study: the cross-sectional study, the time-lag study, the experimental study, the retrospective study, and the study from records. Each of these methods of study is reviewed according to its unique problems and best uses and compared with the longitudinal study. Finally, the traditional longitudinal study is defined as the study: (1) of individual change under natural conditions not controlled by the experimenter, (2) which proceeds over time from the present to the future by measuring the same individuals repeatedly, and (3) which retains individuality of data in analyses. Some problem areas of longitudinal study are delineated which are either unique to this method or especially difficult. The following problems related to planning the study are reviewed: definition of study objectives, selection of method of study, statistical methods, cost, post hoc analysis and replication of the study, time factor in longitudinal study, and the problem of allowing variables to operate freely. Cultural shift and attrition are especially emphasized. The dilemma is examined which is posed by sample selection with its related problems of randomization and generalizability of the study, together with the problems of repeated measurements and selection of control groups. These problems are illustrated with studies from the literature. Not only are these problems delineated cut considerable evidence is shown that we have already started to accumulate data that will permit their solution. This paper presents a number of studies which have considered these problems separately or as a side issue of a study on some other topic. Some recommendations for further research in problem areas are suggested. At the same time that this thesis notes differentiation of the longitudinal study from other studies, it also notes integration of results of longitudinal studies with results of other studies. The tenet adopted here is: scientific knowledge is cumulative and not dependent on one crucial experiment. Trends in recent longitudinal studies are found to be toward more strict observance of scientific protocols and toward limitation of time and objectives of the study. When objectives of the study are well defined and time is limited to only enough for specified change to take place, many of the problems of longitudinal study are reduced to manageable proportions. Although modern studies are of improved quality, longitudinal method is not being sufficiently used today to supply the demand for this type of data. Longitudinal study is necessary to answer some of the questions in developmental psychology. We have no alternative but to continue to develop this important research tool.
4

Models for Univariate and Multivariate Analysis of Longitudinal and Clustered Data

Luo, Dandan Unknown Date
No description available.
5

An Empirical Evaluation of Neural Process Meta-Learners for Financial Forecasting

Patel, Kevin G 01 June 2023 (has links) (PDF)
Challenges of financial forecasting, such as a dearth of independent samples and non- stationary underlying process, limit the relevance of conventional machine learning towards financial forecasting. Meta-learning approaches alleviate some of these is- sues by allowing the model to generalize across unrelated or loosely related tasks with few observations per task. The neural process family achieves this by con- ditioning forecasts based on a supplied context set at test time. Despite promise, meta-learning approaches remain underutilized in finance. To our knowledge, ours is the first application of neural processes to realized volatility (RV) forecasting and financial forecasting in general. We propose a hybrid temporal convolutional network attentive neural process (ANP- TCN) for the purpose of financial forecasting. The ANP-TCN combines a conven- tional and performant financial time series embedding model (TCN) with an ANP objective. We found ANP-TCN variant models outperformed the base TCN for equity index realized volatility forecasting. In addition, when stack-ensembled with a tree- based model to forecast a trading signal, the ANP-TCN outperformed the baseline buy-and-hold strategy and base TCN model in out-of-sample performance. Across four liquid US equity indices (incl. S&P 500) tested over ∼15 years, the best long-short models (reported by median trajectory) resulted in the following out-of-sample (∼3 years) performance ranges: directional accuracy of 58.65% to 62.26%, compound an- nual growth rate (CAGR) of 0.2176 to 0.4534, and annualized Sharpe ratio of 2.1564 to 3.3375. All project code can be found at: https://github.com/kpa28-git/thesis-code.
6

Nonlinear Hierarchical Models for Longitudinal Experimental Infection Studies

Singleton, Michael David 01 January 2015 (has links)
Experimental infection (EI) studies, involving the intentional inoculation of animal or human subjects with an infectious agent under controlled conditions, have a long history in infectious disease research. Longitudinal infection response data often arise in EI studies designed to demonstrate vaccine efficacy, explore disease etiology, pathogenesis and transmission, or understand the host immune response to infection. Viral loads, antibody titers, symptom scores and body temperature are a few of the outcome variables commonly studied. Longitudinal EI data are inherently nonlinear, often with single-peaked response trajectories with a common pre- and post-infection baseline. Such data are frequently analyzed with statistical methods that are inefficient and arguably inappropriate, such as repeated measures analysis of variance (RM-ANOVA). Newer statistical approaches may offer substantial gains in accuracy and precision of parameter estimation and power. We propose an alternative approach to modeling single-peaked, longitudinal EI data that incorporates recent developments in nonlinear hierarchical models and Bayesian statistics. We begin by introducing a nonlinear mixed model (NLMM) for a symmetric infection response variable. We employ a standard NLMM assuming normally distributed errors and a Gaussian mean response function. The parameters of the model correspond directly to biologically meaningful properties of the infection response, including baseline, peak intensity, time to peak and spread. Through Monte Carlo simulation studies we demonstrate that the model outperforms RM-ANOVA on most measures of parameter estimation and power. Next we generalize the symmetric NLMM to allow modeling of variables with asymmetric time course. We implement the asymmetric model as a Bayesian nonlinear hierarchical model (NLHM) and discuss advantages of the Bayesian approach. Two illustrative applications are provided. Finally we consider modeling of viral load. For several reasons, a normal-errors model is not appropriate for viral load. We propose and illustrate a Bayesian NLHM with the individual responses at each time point modeled as a Poisson random variable with the means across time points related through a Tricube mean response function. We conclude with discussion of limitations and open questions, and a brief survey of broader applications of these models.
7

Us and Them: The Role of Inter-Group Distance and Size in Predicting Civil Conflict

Moffett, Michaela E 01 January 2015 (has links)
Recent large-N studies conclude that inequality and ethnic distribution have no significant impact on the risk of civil conflict. This study argues that such conclusions are erroneous and premature due to incorrect specification of independent variables and functional forms. Case studies suggest that measures of inter-group inequality (horizontal inequality) and polarization (ethnic distribution distance from a bipolar equilibrium) are more accurate predictors of civil conflict, as they better capture the group-motivation aspect of conflict. This study explores whether indicators of inequality and ethnic distribution impact the probability of civil conflict across 38 developing countries in the period 1986 to 2004. Analysis reveals that horizontal inequality and polarization have significant, robust relationships with civil conflict. Furthermore, vertical, or individual, inequality is a robust, significant predictor of civil conflict when specified as a nonlinear function.
8

LATENT VARIABLE MODELS GIVEN INCOMPLETELY OBSERVED SURROGATE OUTCOMES AND COVARIATES

Ren, Chunfeng 01 January 2014 (has links)
Latent variable models (LVMs) are commonly used in the scenario where the outcome of the main interest is an unobservable measure, associated with multiple observed surrogate outcomes, and affected by potential risk factors. This thesis develops an approach of efficient handling missing surrogate outcomes and covariates in two- and three-level latent variable models. However, corresponding statistical methodologies and computational software are lacking efficiently analyzing the LVMs given surrogate outcomes and covariates subject to missingness in the LVMs. We analyze the two-level LVMs for longitudinal data from the National Growth of Health Study where surrogate outcomes and covariates are subject to missingness at any of the levels. A conventional method for efficient handling of missing data is to reexpress the desired model as a joint distribution of variables, including the surrogate outcomes that are subject to missingness conditional on all of the covariates that are completely observable, and estimate the joint model by maximum likelihood, which is then transformed to the desired model. The joint model, however, identifies more parameters than desired, in general. The over-identified joint model produces biased estimates of LVMs so that it is most necessary to describe how to impose constraints on the joint model so that it has a one-to-one correspondence with the desired model for unbiased estimation. The constrained joint model handles missing data efficiently under the assumption of ignorable missing data and is estimated by a modified application of the expectation-maximization (EM) algorithm.
9

Modelos para a análise de dados de contagens longitudinais com superdispersão: estimação INLA / Models for data analysis of longitudinal counts with overdispersion: INLA estimation

Rocha, Everton Batista da 04 September 2015 (has links)
Em ensaios clínicos é muito comum a ocorrência de dados longitudinais discretos. Para sua análise é necessário levar em consideração que dados observados na mesma unidade experimental ao longo do tempo possam ser correlacionados. Além dessa correlação inerente aos dados é comum ocorrer o fenômeno de superdispersão (ou sobredispersão), em que, existe uma variabilidade nos dados além daquela captada pelo modelo. Um caso que pode acarretar a superdispersão é o excesso de zeros, podendo também a superdispersão ocorrer em valores não nulos, ou ainda, em ambos os casos. Molenberghs, Verbeke e Demétrio (2007) propuseram uma classe de modelos para acomodar simultaneamente a superdispersão e a correlação em dados de contagens: modelo Poisson, modelo Poisson-gama, modelo Poisson-normal e modelo Poisson-normal-gama (ou modelo combinado). Rizzato (2011) apresentou a abordagem bayesiana para o ajuste desses modelos por meio do Método de Monte Carlo com Cadeias de Markov (MCMC). Este trabalho, para modelar a incerteza relativa aos parâmetros desses modelos, considerou a abordagem bayesiana por meio de um método determinístico para a solução de integrais, INLA (do inglês, Integrated Nested Laplace Approximations). Além dessa classe de modelos, como objetivo, foram propostos outros quatros modelos que também consideram a correlação entre medidas longitudinais e a ocorrência de superdispersão, além da ocorrência de zeros estruturais e não estruturais (amostrais): modelo Poisson inacionado de zeros (ZIP), modelo binomial negativo inacionado de zeros (ZINB), modelo Poisson inacionado de zeros - normal (ZIP-normal) e modelo binomial negativo inacionado de zeros - normal (ZINB-normal). Para ilustrar a metodologia desenvolvida, um conjunto de dados reais referentes à contagens de ataques epilépticos sofridos por pacientes portadores de epilepsia submetidos a dois tratamentos (um placebo e uma nova droga) ao longo de 27 semanas foi considerado. A seleção de modelos foi realizada utilizando-se medidas preditivas baseadas em validação cruzada. Sob essas medidas, o modelo selecionado foi o modelo ZIP-normal, sob o modelo corrente na literatura, modelo combinado. As rotinas computacionais foram implementadas no programa R e são parte deste trabalho. / Discrete and longitudinal structures naturally arise in clinical trial data. Such data are usually correlated, particularly when the observations are made within the same experimental unit over time and, thus, statistical analyses must take this situation into account. Besides this typical correlation, overdispersion is another common phenomenon in discrete data, defined as a greater observed variability than that nominated by the statistical model. The causes of overdispersion are usually related to an excess of observed zeros (zero-ination), or an excess of observed positive specific values or even both. Molenberghs, Verbeke e Demétrio (2007) have developed a class of models that encompasses both overdispersion and correlation in count data: Poisson, Poisson-gama, Poisson-normal, Poissonnormal- gama (combined model) models. A Bayesian approach was presented by Rizzato (2011) to fit these models using the Markov Chain Monte Carlo method (MCMC). In this work, a Bayesian framework was adopted as well and, in order to consider the uncertainty related to the model parameters, the Integrated Nested Laplace Approximations (INLA) method was used. Along with the models considered in Rizzato (2011), another four new models were proposed including longitudinal correlation, overdispersion and zero-ination by structural and random zeros, namely: zero-inated Poisson (ZIP), zero-inated negative binomial (ZINB), zero-inated Poisson-normal (ZIP-normal) and the zero-inated negative binomial-normal (ZINB-normal) models. In order to illustrate the developed methodology, the models were fit to a real dataset, in which the response variable was taken to be the number of epileptic events per week in each individual. These individuals were split into two groups, one taking placebo and the other taking an experimental drug, and they observed up to 27 weeks. The model selection criteria were given by different predictive measures based on cross validation. In this setting, the ZIP-normal model was selected instead the usual model in the literature (combined model). The computational routines were implemented in R language and constitute a part of this work.
10

Time Series Decomposition Using Singular Spectrum Analysis

Deng, Cheng 01 May 2014 (has links)
Singular Spectrum Analysis (SSA) is a method for decomposing and forecasting time series that recently has had major developments but it is not yet routinely included in introductory time series courses. An international conference on the topic was held in Beijing in 2012. The basic SSA method decomposes a time series into trend, seasonal component and noise. However there are other more advanced extensions and applications of the method such as change-point detection or the treatment of multivariate time series. The purpose of this work is to understand the basic SSA method through its application to the monthly average sea temperature in a point of the coast of South America, near where “EI Ni˜no” phenomenon originates, and to artificial time series simulated using harmonic functions. The output of the basic SSA method is then compared with that of other decomposition methods such as classic seasonal decomposition, X-11 decomposition using moving averages and seasonal decomposition by Loess (STL) that are included in some time series courses.

Page generated in 0.0971 seconds