• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 547
  • 94
  • 78
  • 58
  • 36
  • 25
  • 25
  • 25
  • 25
  • 25
  • 24
  • 22
  • 15
  • 4
  • 3
  • Tagged with
  • 952
  • 952
  • 221
  • 162
  • 139
  • 126
  • 97
  • 90
  • 87
  • 74
  • 72
  • 69
  • 66
  • 63
  • 62
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Model selection for cointegrated relationships in small samples

He, Wei January 2008 (has links)
Vector autoregression models have become widely used research tools in the analysis of macroeconomic time series. Cointegrated techniques are an essential part of empirical macroeconomic research. They infer causal long-run relationships between nonstationary variables. In this study, six information criteria were reviewed and compared. The methods focused on determining the optimum information criteria for detecting the correct lag structure of a two-variable cointegrated process.
202

Randomization in a two armed clinical trial: an overview of different randomization techniques

Batidzirai, Jesca Mercy January 2011 (has links)
Randomization is the key element of any sensible clinical trial. It is the only way we can be sure that the patients have been allocated into the treatment groups without bias and that the treatment groups are almost similar before the start of the trial. The randomization schemes used to allocate patients into the treatment groups play a role in achieving this goal. This study uses SAS simulations to do categorical data analysis and comparison of differences between two main randomization schemes namely unrestricted and restricted randomization in dental studies where there are small samples, i.e. simple randomization and the minimization method respectively. Results show that minimization produces almost equally sized treatment groups, but simple randomization is weak in balancing prognostic factors. Nevertheless, simple randomization can also produce balanced groups even in small samples, by chance. Statistical power is also improved when minimization is used than in simple randomization, but bigger samples might be needed to boost the power.
203

A comparison of longitudinal statistical methods in studies of pulmonary function decline

Dimich-Ward, Helen D. 05 1900 (has links)
Three longitudinal pulmonary function data sets were analyzed by several statistical methods for the purposes of: 1) determining to what degree the conclusions of an analysis for a given data set are method dependent; 2) assessing the properties of each method across the different data sets; 3) studying the correlates of FEV₁ decline including physical, behavioral, and respiratory factors, as well as city of residence and type of work. 4) assessing the appropriateness of modelling the standard linear relationship of FEV₁ with time and providing alternative approaches; 5) describing longitudinal change in various lung function variables, apart from FEV₁. The three data sets were comprised of (1) yearly data on 141 veterans with mild chronic bronchitis, taken at three Canadian centres, for a maximum of 23 years of follow-up; their mean age at the start of the study was 49 years (s.d.=9) and only 10.6% were nonsmokers during the follow-up; (2) retrospective data on 384 coal workers categorized into four groups according to vital status (dead or alive) and smoking behavior, with irregular follow-up intervals ranging from 2 to 12 measurements per individual over a period of 9 to 30 years; (3) a relatively balanced data set on 269 grain workers and a control group of 58 civic workers, which consisted of 3 to 4 measurements taken over an average follow-up of 9 years. Their mean age at first measurement was 37 years (s.d.=10) and 53.2% of the subjects did not smoke. A review of the pulmonary and statistical literature was carried out to identify methods of analysis which had been applied to calculate annual change in FEV₁. Five methods chosen for the data analyses were variants of ordinary least squares approaches. The other four methods were based on the use of transformations, weighted least squares, or covariance structure models using generalized least squares approaches. For the coal workers, the groups that were alive at the time of ascertainment had significantly smaller average FEV₁ declines than the deceased groups. Post-retirement decline in FEV₁ was shown by one statistical method to significantly increase for coal workers who smoked, while a significant decrease was observed for nonsmokers. Veterans from Winnipeg consistently showed the lowest decline estimates in comparison to Halifax and Toronto; recorded air pollution measurements were found to be the lowest for Winnipeg, while no significant differences in smoking behavior were found between the veterans of each city. The data set of grain workers proved most ameniable to all the different analytical techniques, which were consistent in showing no significant differences in FEV₁ decline between the grain and civic workers groups and the lowest magnitude of FEV₁ decline. It was shown that quadratic and allometric analyses provided additional information to the linear description of FEV₁ decline, particularly for the study of pulmonary decline among older or exposed populations over an extended period of time. Whether the various initial lung function variables were each predictive of later decline was dependent on whether absolute or percentage decline was evaluated. The pattern of change in these lung function measures over time showed group differences suggestive of different physiological responses. Although estimates of FEV₁ decline were similar between the various methods, the magnitude and relative order of the different groups and the statistical significance of the observed inter-group comparisons were method-dependent No single method was optimal for analysis of all three data sets. The reliance on only one model, and one type of lung function measurement to describe the data, as is commonly found in the pulmonary literature, could lead to a false interpretation of the result Thus a comparative approach, using more than one justifiable model for analysis is recommended, especially in the usual circumstances where missing data or irregular follow-up times create imbalance in the longitudinal data set. / Graduate and Postdoctoral Studies / Graduate
204

Statistical methods for Mendelian randomization using GWAS summary data

Hu, Xianghong 23 August 2019 (has links)
Mendelian Randomization (MR) is a powerful tool for accessing causality of exposure on an outcome using genetic variants as the instrumental variables. Much of the recent developments is propelled by the increasing availability of GWAS summary data. However, the accuracy of the MR causal effect estimates could be challenged in case of the MR assumptions are violated. The source of biases could attribute to the weak effects arising because of polygenicity, the presentence of horizontal pleiotropy and other biases, e.g., selection bias. In this thesis, we proposed two works, expecting to deal with these issues.In the first part, we proposed a method named 'Bayesian Weighted Mendelian Randomization (BMWR)' for causal inference using summary statistics from GWAS. In BWMR, we not only take into account the uncertainty of weak effects owning to polygenicity of human genomics but also models the weak horizontal pleiotropic effects. Moreover, BWMR adopts a Bayesian reweighting strategy for detection of large pleiotropic outliers. An efficient algorithm based on variational inference was developed to make BWMR computationally efficient and stable. Considering the underestimated variance provided by variational inference, we further derived a closed form variance estimator inspired by a linear response method. We conducted several simulations to evaluate the performance of BWMR, demonstrating the advantage of BWMR over other methods. Then, we applied BWMR to access causality between 126 metabolites and 90 complex traits, revealing novel causal relationships. In the second part, we further developed BWMR-C: Statistical correction of selection bias for Mendelian Randomization based on a Bayesian weighted method. Based on the framework of BWMR, the probability model in BWMR-C is built conditional on the IV selection criteria. In such way, BWMR-C delicated to reduce the influence of the selection process on the causal effect estimates and also preserve the good properties of BWMR. To make the causal inference computationally stable and efficient, we developed a variational EM algorithm. We conducted several comprehensive simulations to evaluate the performance of BWMR-C for correction of selection bias. Then, we applied BWMR-C on seven body fat distribution related traits and 140 UK Biobank traits. Our results show that BWMR-C achieves satisfactory performance for correcting selection bias. Keywords: Mendelian Randomization, polygenicity, horizontal pleiotropy, selection bias, variation inference.
205

Clustering Algorithm for Zero-Inflated Data

January 2020 (has links)
Zero-inflated data are common in biomedical research. In cluster analysis, the heuristic approach fails to provide inferential properties to the outcome while the existing model-based approach only works in the case of a mixture of multivariate normal. In this dissertation, I developed two new model-based clustering algorithms- the multivariate zero-inflated log-normal and the multivariate zero-inflated Poisson clustering algorithms. I then applied these methods to the questionnaire data and compare the resulting clusters to the ones derived from assuming multivariate normal distribution. Associations between clustering results and clinical outcomes were also investigated.
206

STATISTICAL MODELING OF SHIP AIRWAKES INCLUDING THE FEASIBILITY OF APPLYING MACHINE LEARNING

Unknown Date (has links)
Airwakes are shed behind the ship’s superstructure and represent a highly turbulent and rapidly distorting flow field. This flow field severely affects pilot’s workload and such helicopter shipboard operations. It requires both the one-point statistics of autospectrum and the two-point statistics of coherence (normalized cross-spectrum) for a relatively complete description. Recent advances primarily refer to generating databases of flow velocity points through experimental and computational fluid dynamics (CFD) investigations, numerically computing autospectra along with a few cases of cross-spectra and coherences, and developing a framework for extracting interpretive models of autospectra in closed form from a database along with an application of this framework to study the downwash effects. By comparison, relatively little is known about coherences. In fact, even the basic expressions of cross-spectra and coherences for three components of homogeneous isotropic turbulence (HIT) vary from one study to the other, and the related literature is scattered and piecemeal. Accordingly, this dissertation begins with a unified account of all the cross-spectra and coherences of HIT from first principles. Then, it presents a framework for constructing interpretive coherence models of airwake from a database on the basis of perturbation theory. For each velocity component, the coherence is represented by a separate perturbation series in which the basis function or the first term on the right-hand side of the series is represented by the corresponding coherence for HIT. The perturbation series coefficients are evaluated by satisfying the theoretical constraints and fitting a curve in a least squares sense on a set of numerically generated coherence points from a database. Although not tested against a specific database, the framework has a mathematical basis. Moreover, for assumed values of perturbation series constants, coherence results are presented to demonstrate how coherences of airwakes and such flow fields compare to those of HIT. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
207

Advances in Machine Learning for Compositional Data

Gordon Rodriguez, Elliott January 2022 (has links)
Compositional data refers to simplex-valued data, or equivalently, nonnegative vectors whose totals are uninformative. This data modality is of relevance across several scientific domains. A classical example of compositional data is the chemical composition of geological samples, e.g., major-oxide concentrations. A more modern example arises from the microbial populations recorded using high-throughput genetic sequencing technologies, e.g., the gut microbiome. This dissertation presents a set of methodological and theoretical contributions that advance the state of the art in the analysis of compositional data. Our work can be divided along two categories: problems in which compositional data represents the input to a predictive model, and problems in which it represents the output of the model. For the first class of problems, we build on the popular log-ratio framework to develop an efficient learning algorithm for high-dimensional compositional data. Our algorithm runs orders of magnitude faster than competing alternatives, without sacrificing model quality. For the second class of problems, we define a novel exponential family of probability distributions supported on the simplex. This distribution enjoys attractive mathematical properties and provides a performant probability model for simplex-valued outcomes. Taken together, our results constitute a broad contribution to the toolkit of researchers and practitioners studying compositional data.
208

The estimation of missing values in hydrological records using the EM algorithm and regression methods

Makhuvha, Tondani January 1988 (has links)
Includes bibliography. / The objective of this thesis is to review existing methods for estimating missing values in rainfall records and to propose a number of new procedures. Two classes of methods are considered. The first is based on the theory of variable selection in regression. Here the emphasis is on finding efficient methods to identify the set of control stations which are likely to yield the best regression estimates of the missing values in the target station. The second class of methods is based on the EM algorithm, proposed by Dempster, Laird and Rubin (1977). The emphasis here is to estimate the missing values directly without first making a detailed selection of control stations. All "relevant" stations are included. This method has not previously been applied in the context of estimating missing rainfall values.
209

An empirical evaluation of the Altman (1968) failure prediction model on South African JSE listed companies

Rama, Kavir D. 18 March 2013 (has links)
Credit has become very important in the global economy (Cynamon and Fazzari, 2008). The Altman (1968) failure prediction model, or derivatives thereof, are often used in the identification and selection of financially distressed companies as it is recognized as one of the most reliable in predicting company failure (Eidleman, 1995). Failure of a firm can cause substantial losses to creditors and shareholders, therefore it is important, to detect company failure as early as possible. This research report empirically tests the Altman (1968) failure prediction model on 227 South African JSE listed companies using data from the 2008 financial year to calculate the Z-score within the model, and measuring success or failure of firms in the 2009 and 2010 years. The results indicate that the Altman (1968) model is a viable tool in predicting company failure for firms with positive Z-scores, and where Z-scores do not fall into the range of uncertainty as specified. The results also suggest that the model is not reliable when the Z–scores are negative or when they are in the range of uncertainty (between 2.99 and 1.81). If one is able to predict firm failure in advance, it should be possible for management to take steps to avert such an occurrence (Deakin, 1972; Keasey and Watson, 1991; Platt and Platt, 2002).
210

Is the way forward to step back? A meta-research analysis of misalignment between goals, methods, and conclusions in epidemiologic studies.

Kezios, Katrina Lynn January 2021 (has links)
Recent discussion in the epidemiologic methods and teaching literatures centers around the importance of clearly stating study goals, disentangling the goal of causation from prediction (or description), and clarifying the statistical tools that can address each goal. This discussion illuminates different ways in which mismatches can occur between study goals, methods, and interpretations, which this dissertation synthesizes into the concept of “misalignment”; misalignment occurs when the study methods and/or interpretations are inappropriate for (i.e., do not match) the study’s goal. While misalignments can occur and may cause problems, their pervasiveness and consequences have not been examined in the epidemiologic literature. Thus, the overall purpose of this dissertation was to document and examine the effects of misalignment problems seen in epidemiologic practice. First, a review was conducted to document misalignment in a random sample of epidemiologic studies and explore how the framing of study goals contributes to its occurrence. Among the reviewed articles, full alignment between study goals, methods, and interpretations was infrequently observed, although “clearly causal” studies (those that framed causal goals using causal language) were more often fully aligned (5/13, 38%) than “seemingly causal” ones (those that framed causal goals using associational language; 3/71, 4%). Next, two simulation studies were performed to examine the potential consequences of different types of misalignment problems seen in epidemiologic practice. They are based on the observation that, often, studies that are causally motivated perform analyses that appear disconnected from, or “misaligned” with, their causal goal. A primary aim of the first simulation study was to examine goal--methods misalignment in terms of inappropriate variable selection for exposure effect estimation (a causal goal). The main difference between predictive and causal models is the conceptualization and treatment of “covariates”. Therefore, exposure coefficients were compared from regression models built using different variable selection approaches that were either aligned (appropriate for causation) or misaligned (appropriate for prediction) with the causal goal of the simulated analysis. The regression models were characterized by different combinations of variable pools and inclusion criteria to select variables from the pools into the models. Overall, for valid exposure effect estimation in a causal analysis, the creation of the variable pool mattered more than the specific inclusion criteria, and the most important criterion when creating the variable pool was to exclude mediators. The second simulation study concretized the misalignment problem by examining the consequences of goal--method misalignment in the application of the structured life course approach, a statistical method for distinguishing among different causal life course models of disease (e.g., critical period, accumulation of risk). Although exchangeability must be satisfied for valid results using this approach, in its empirical applications, confounding is often ignored. These applications are misaligned because they use methods for description (crude associations) for a causal goal (identifying causal processes). Simulations were used to mimic this misaligned approach and examined its consequences. On average, when life course data was generated under a “no confounding” scenario - an unlikely real-world scenario - the structured life course approach was quite accurate in identifying the life course model that generated the data. However, in the presence of confounding, the wrong underlying life course model was often identified. Five life course confounding structures were examined; as the complexity of examined confounding scenarios increased, particularly when this confounding was strong, incorrect model selection using the structured life course approach was common. The misalignment problem is recognized but underappreciated in the epidemiologic literature. This dissertation contributes to the literature by documenting, simulating, and concretizing problems of misalignment in epidemiologic practice.

Page generated in 0.1561 seconds