• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3012
  • 1002
  • 369
  • 345
  • 272
  • 182
  • 174
  • 160
  • 82
  • 54
  • 30
  • 29
  • 23
  • 22
  • 21
  • Tagged with
  • 6621
  • 2241
  • 1127
  • 915
  • 851
  • 791
  • 740
  • 738
  • 643
  • 542
  • 499
  • 486
  • 444
  • 417
  • 397
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
311

Permutation procedures for ANOVA, regression and PCA

Storm, Christine 24 May 2013 (has links)
Parametric methods are effective and appropriate when data sets are obtained by well-defined random sampling procedures, the population distribution for responses is well-defined, the null sampling distributions of suitable test statistics do not depend on any unknown entity and well-defined likelihood models are provided for by nuisance parameters. Permutation testing methods, on the other hand, are appropriate and unavoidable when distribution models for responses are not well specified, nonparametric or depend on too many nuisance parameters; when ancillary statistics in well-specified distributional models have a strong influence on inferential results or are confounded with other nuisance entities; when the sample sizes are less than the number of parameters and when data sets are obtained by ill-specified selection-bias procedures. In addition, permutation tests are useful not only when parametric tests are not possible, but also when more importance needs to be given to the observed data set, than to the population model, as is typical for example in biostatistics. The different types of permutation methods for analysis of variance, multiple linear regression and principal component analysis are explored. More specifically, one-way, twoway and three-way ANOVA permutation strategies will be discussed. Approximate and exact permutation tests for the significance of one or more regression coefficients in a multiple linear regression model will be explained next, and lastly, the use of permutation tests used as a means to validate and confirm the results obtained from the exploratory PCA will be described. / Dissertation (MSc)--University of Pretoria, 2012. / Statistics / unrestricted
312

A detailed investigation of the linear model and some of its underlying assumptions

Coutsourides, Dimitris January 1977 (has links)
Bibliography: p. 178-182. / The purpose of this thesis is to provide a study of the linear model. The whole work has been split into 6 chapters. In Chapter 1 we define and examine the two linear models, i.e. the regression and the correlation model. More specifically we show that the regression model is the conditional version of the correlation model. In Chapter 2 we deal with the problem of multicollinearity. We investigate the sources of near singularities, we give some methods of detecting the multicollinearity, and we state briefly methods for overcoming this problem. In Chapter 3 we consider the least squares method with restrictions, and we dispose of some tests for testing the linear restrictions. The theory concerning the sign of least squares estimates is discussed, then we deal with the method for augmenting existing data. Chapter 4 is mainly devoted to ridge regression. We state methods for selecting the best estimate for k. Some extensions are given dealing with the shrinkage estimators and the linear transforms of the least squares. In Chapter 5 we deal with the principal components, and we give methods for selecting the best subset of principal components. Much attention was given to a method called fractional rank and latent root regression analysis. In Chapter 6 comparisons were performed between estimators previously mentioned. Finally the conclusions are stated.
313

Regression analysis of big count data via a-optimal subsampling

Zhao, Xiaofeng 19 July 2018 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / There are two computational bottlenecks for Big Data analysis: (1) the data is too large for a desktop to store, and (2) the computing task takes too long waiting time to finish. While the Divide-and-Conquer approach easily breaks the first bottleneck, the Subsampling approach simultaneously beat both of them. The uniform sampling and the nonuniform sampling--the Leverage Scores sampling-- are frequently used in the recent development of fast randomized algorithms. However, both approaches, as Peng and Tan (2018) have demonstrated, are not effective in extracting important information from data. In this thesis, we conduct regression analysis for big count data via A-optimal subsampling. We derive A-optimal sampling distributions by minimizing the trace of certain dispersion matrices in general estimating equations (GEE). We point out that the A-optimal distributions have the same running times as the full data M-estimator. To fast compute the distributions, we propose the A-optimal Scoring Algorithm, which is implementable by parallel computing and sequentially updatable for stream data, and has faster running time than that of the full data M-estimator. We present asymptotic normality for the estimates in GEE's and in generalized count regression. A data truncation method is introduced. We conduct extensive simulations to evaluate the numerical performance of the proposed sampling distributions. We apply the proposed A-optimal subsampling method to analyze two real count data sets, the Bike Sharing data and the Blog Feedback data. Our results in both simulations and real data sets indicated that the A-optimal distributions substantially outperformed the uniform distribution, and have faster running times than the full data M-estimators.
314

Machine learning and template based modeling for improving and expanding the functionality of rigid body docking

Desta, Israel Tilahun 28 September 2021 (has links)
Proteins govern practically every process in living organisms through inhibiting, activating, or acting on other proteins in different ways. With the large and growing number of known interactions through high throughput screening technologies, experimental determination of atomic-level details of these interactions is nigh impossible. Computational methods such as docking can speed up efforts of understanding these interactions. However, several issues ought to be addressed before docking can replace experimental methods. This thesis describes work on assessment of the state of the art in docking methods, implementation of a machine learning algorithm to improve model ranking and integration of docking with template-based modeling to expand its usage with a special focus on antibody-antigen interactions. Firstly, the performance of docking methods was rigorously assessed by using a diverse set of protein complexes with a special focus on ClusPro, one of the leading rigid-body docking servers. Different strengths and potential areas of improvement for ClusPro and rigid-body docking methods in general were highlighted. Secondly, one of the major short-comings of docking noted in the first project, poor ranking of good models, was addressed. A regression-based machine learning algorithm was introduced to improve the ranking. Finally, a server was developed to tackle the challenge of epitope mapping by integrating template-based modeling with docking. An intuitive ensemble approach to scoring residue likelihood using docking poses and different homologues is shown to yield great success. In addition to shifting docking’s purpose of conformational search to interface identification, this server also allows users to start with protein sequence inputs. / 2022-09-28T00:00:00Z
315

Linear Regression Analysis of the Suspended Sediment Load in Rivers and Streams Using Data of Similar Precipitation Values

Jamison, Jonathan A. 21 November 2018 (has links)
No description available.
316

Comparison of Regression Methods with Non-Convex Penalties

Pipher, Brandon 07 November 2019 (has links)
No description available.
317

A multi-gene symbolic regression approach for predicting LGD : A benchmark comparative study

Tuoremaa, Hanna January 2023 (has links)
Under the Basel accords for measuring regulatory capital requirements, the set of credit risk parameters probability of default (PD), exposure at default (EAD) and loss given default (LGD) are measured with own estimates by the internal rating based approach. The estimated parameters are also the foundation of understanding the actual risk in a banks credit portfolio. The predictive performance of such models are therefore interesting to examine. The credit risk parameter LGD has been seen to give low performance for predictive models and LGD values are generally hard to estimate. The main purpose of this thesis is to analyse the predictive performance of a multi-gene genetic programming approach to symbolic regression compared to three benchmark regression models. The goal of multi-gene symbolic regression is to estimate the underlying relationship in the data through a linear combination of a set of generated mathematical expressions. The benchmark models are Logit Transformed Regression, Beta Regression and Regression Tree. All benchmark models are frequently used in the area. The data used to compare the models is a set of randomly selected, de-identified loans from the portfolios of underlying U.S. residential mortgage-backed securities retrieved from International Finance Research. The conclusion from implementing and comparing the models is that, the credit risk parameter LGD is continued difficult to estimated, the symbolic regression approach did not yield a better predictive ability than the benchmark models and it did not seem to find the underlying relationship in the data. The benchmark models are more user-friendly with easier implementation and they all requires less calculation complexity than symbolic regression.
318

An Empirical Study on Correlation Patterns of Disruptions by Flooding Hazards

Wang, Jin 17 May 2014 (has links)
Flooding is one of the fatal natural hazards frequently generating serious impact to infrastructures. As yet, its characteristics are expected to be changing with the changing of global climate. This paper identifies the spatio-temporal correlation patterns of disruptions by flooding hazards at the county-level for the Deep South in the United States, particularly the state of Arkansas. The frequency of each flooding disruption calculated as time series, is generated from flooding records within research period of 1998-2013. A set of quality control procedures including duplicated data check, spatial outliers check, and homogeneity test is applied prior to the regression analysis. The spatial characteristic of those disruptions is identified by mapping them, while their temporal characteristic is assessed using correlation coefficient defined in this paper. Accordingly, greater correlation of disruptions by flooding is found with the decreasing of the distance between for most pairs of the locations throughout the study period.
319

Evaluation of Herbicide Formulation and Spray Nozzle Selection on Physical Spray Drift

Cobb, Jasper Lewis 13 December 2014 (has links)
New transgenic crops are currently being developed which will be tolerant to dicamba and 2,4-D herbicides. This technology could greatly benefit producers who are impacted by weed species that have developed resistance to other herbicides, like glyphosate-resistant Palmer Amaranth. Adoption of this new technology is likely to be rapid and widespread which will lead to an increase in the amount of dicamba and 2,4-D applied each season. It is well-documented that these herbicides are very injurious to soybeans, cotton, tomatoes, and most other broadleaf crops, and their increased use brings along increased chances of physical spray drift onto susceptible crops. Because of these risks, research is being conducted on new herbicide formulation/spray nozzle combinations to determine management options which may minimize physical spray drift.
320

GIS-based Evaluation of Landslide Susceptibility for Eastern Tennessee

Smith, Sara Ann 06 May 2017 (has links)
The Appalachian Mountains in eastern Tennessee are known for landslides, and landslides are reported to cause millions of dollars of damage. To aid in the estimation of future susceptibility, geographic information systems was used to perform a logistic regression, to identify landslides in eastern Tennessee. Landslide model results validated using Kold cross validation. The model results suggest that the environmental variables slope, soil, landcover/vegetation, and distance to roads were significant factors related to landslide susceptibility. The susceptibility map showed that 86.8% of urban areas in eastern Tennessee were at highest susceptibility for landslides, possibly due to lower amounts of landcover. By overlaying past landslides on landslide susceptibility for accuracy, areas with high landslide susceptibility were found in areas along main highways and interstates. This model is a first step in using GIS to increase the awareness of landslide susceptibility in the regions and may ultimately lead to better preparation.

Page generated in 0.1262 seconds