Spelling suggestions: "subject:"degression spline"" "subject:"aregression spline""
1 |
Investigating the Utility of Age-Dependent Cranial Vault Thickness as an Aging Method for Juvenile Skeletal Remains on Dry Bone, Radiographic and Computed Tomography ScansKamnikar, Kelly R 07 May 2016 (has links)
Age estimation, a component of the biological profile, contributes significantly to the creation of a post-mortem profile of an unknown set of human remains. This goal of this study is to: (1) refine the juvenile age estimation method of cranial vault thickness (CVT) through MARS modeling, (2) test the method on known age samples, and (3) compare CVT and dental development age estimations. Data for this study comes from computed tomography (CT) scans, radiographic images, and dry bone. CVT was measured at seven cranial landmarks (nasion, glabella, bregma, vertex, vertex radius, lambda and opisthocranion). Results indicate that CVT models vary in their predictive ability; vertex and lambda produce the best results. Predicted fit values and prediction intervals for CVT are larger, and less accurate than dental development age estimates. Aging by CVT could benefit from a larger known age sample composed of individuals older than 6 years old.
|
2 |
An OLS-Based Method for Causal Inference in Observational StudiesXu, Yuanfang 07 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Observational data are frequently used for causal inference of treatment effects
on prespecified outcomes. Several widely used causal inference methods have adopted
the method of inverse propensity score weighting (IPW) to alleviate the in
uence of
confounding. However, the IPW-type methods, including the doubly robust methods,
are prone to large variation in the estimation of causal e ects due to possible extreme
weights. In this research, we developed an ordinary least-squares (OLS)-based causal
inference method, which does not involve the inverse weighting of the individual
propensity scores.
We first considered the scenario of homogeneous treatment effect. We proposed
a two-stage estimation procedure, which leads to a model-free estimator of
average treatment effect (ATE). At the first stage, two summary scores, the propensity
and mean scores, are estimated nonparametrically using regression splines. The
targeted ATE is obtained as a plug-in estimator that has a closed form expression.
Our simulation studies showed that this model-free estimator of ATE is consistent,
asymptotically normal and has superior operational characteristics in comparison to
the widely used IPW-type methods. We then extended our method to the scenario
of heterogeneous treatment effects, by adding in an additional stage of modeling
the covariate-specific treatment effect function nonparametrically while maintaining
the model-free feature, and the simplicity of OLS-based estimation. The estimated covariate-specific function serves as an intermediate step in the estimation of ATE
and thus can be utilized to study the treatment effect heterogeneity.
We discussed ways of using advanced machine learning techniques in the proposed
method to accommodate high dimensional covariates. We applied the proposed
method to a case study evaluating the effect of early combination of biologic &
non-biologic disease-modifying antirheumatic drugs (DMARDs) compared to step-up
treatment plan in children with newly onset of juvenile idiopathic arthritis disease
(JIA). The proposed method gives strong evidence of significant effect of early combination
at 0:05 level. On average early aggressive use of biologic DMARDs leads to
around 1:2 to 1:7 more reduction in clinical juvenile disease activity score at 6-month
than the step-up plan for treating JIA.
|
3 |
Robust Conic Quadratic Programming Applied To Quality Improvement -a Robustification Of CmarsOzmen, Ayse 01 October 2010 (has links) (PDF)
In this thesis, we study and use Conic Quadratic Programming (CQP) for purposes of operational research, especially, for quality improvement in manufacturing. In previous works, the importance and benefit of CQP in this area became already demonstrated. There, the complexity of the regression method Multivariate Adaptive Regression Spline (MARS), which especially means sensitivity with respect to noise in the data, became penalized in the form of so-called Tikhonov regularization, which became expressed and studied as a CQP problem. This was leading to the new method CMARS / it is more model-based and employs continuous, actually, well-structured convex optimization which enables the use of Interior Point Methods and their codes such as MOSEK. In this study, we are generalizing the regression problem by including uncertainty in the model, especially, in the input data, too.
CMARS, recently developed as an alternative method to MARS, is powerful in overcoming complex and heterogeneous data. However, for MARS and CMARS method, data are assumed to contain fixed variables. In fact, data include noise in both output and input variables. Consequently, optimization problem&rsquo / s solutions can show a remarkable sensitivity to perturbations in the parameters of the problem. In this study, we include the existence of uncertainty in the future scenarios into CMARS and robustify it with robust optimization which is dealt with data uncertainty. That kind of optimization was introduced by Aharon Ben-Tal and Arkadi Nemirovski, and used by Laurent El Ghaoui in the area of data mining. It incorporates various kinds of noise and perturbations into the programming problem. This robustification of CQP with robust optimization is compared with previous contributions that based on Tikhonov regularization, and with the traditional MARS method.
|
4 |
An Efficient Robust Concept Exploration Method and Sequential Exploratory Experimental DesignLin, Yao 31 August 2004 (has links)
Experimentation and approximation are essential for efficiency and effectiveness in concurrent engineering analyses of large-scale complex systems. The approximation-based design strategy is not fully utilized in industrial applications in which designers have to deal with multi-disciplinary, multi-variable, multi-response, and multi-objective analysis using very complicated and expensive-to-run computer analysis codes or physical experiments. With current experimental design and metamodeling techniques, it is difficult for engineers to develop acceptable metamodels for irregular responses and achieve good design solutions in large design spaces at low prices. To circumvent this problem, engineers tend to either adopt low-fidelity simulations or models with which important response properties may be lost, or restrict the study to very small design spaces. Information from expensive physical or computer experiments is often used as a validation in late design stages instead of analysis tools that are used in early-stage design. This increases the possibility of expensive re-design processes and the time-to-market.
In this dissertation, two methods, the Sequential Exploratory Experimental Design (SEED) and the Efficient Robust Concept Exploration Method (E-RCEM) are developed to address these problems. The SEED and E-RCEM methods help develop acceptable metamodels for irregular responses with expensive experiments and achieve satisficing design solutions in large design spaces with limited computational or monetary resources. It is verified that more accurate metamodels are developed and better design solutions are achieved with SEED and E-RCEM than with traditional approximation-based design methods. SEED and E-RCEM facilitate the full utility of the simulation-and-approximation-based design strategy in engineering and scientific applications.
Several preliminary approaches for metamodel validation with additional validation points are proposed in this dissertation, after verifying that the most-widely-used method of leave-one-out cross-validation is theoretically inappropriate in testing the accuracy of metamodels. A comparison of the performance of kriging and MARS metamodels is done in this dissertation. Then a sequential metamodeling approach is proposed to utilize different types of metamodels along the design timeline.
Several single-variable or two-variable examples and two engineering example, the design of pressure vessels and the design of unit cells for linear cellular alloys, are used in this dissertation to facilitate our studies.
|
5 |
Predicting bid prices in construction projects using non-parametric statistical modelsPawar, Roshan 15 May 2009 (has links)
Bidding is a very competitive process in the construction industry; each
competitor’s business is based on winning or losing these bids. Contractors would like to
predict the bids that may be submitted by their competitors. This will help contractors to
obtain contracts and increase their business. Unit prices that are estimated for each
quantity differ from contractor to contractor. These unit costs are dependent on factors
such as historical data used for estimating unit costs, vendor quotes, market surveys,
amount of material estimated, number of projects the contractor is working on,
equipment rental costs, the amount of equipment owned by the contractor, and the risk
averseness of the estimator. These factors are nearly similar when estimators are
estimating cost of similar projects. Thus, there is a relationship between the projects that
a particular contractor has bid in previous years and the cost the contractor is likely to
quote for future projects. This relationship could be used to predict bids that the
contractor might quote for future projects. For example, a contractor may use historical
data for a certain year for bidding on certain type of projects, the unit prices may be
adjusted for size, time and location, but the basis for bidding on projects of similar types
is the same. Statistical tools can be used to model the underlying relationship between the final cost of the project quoted by a contractor to the quantities of materials or
amount of tasks performed in a project. There are a number of statistical modeling
techniques, but a model used for predicting costs should be flexible enough that it could
adjust to depict any underlying pattern.
Data such as amount of work to be performed for a certain line item, material
cost index, labor cost index and a unique identifier for each participating contractor is
used to predict bids that a contractor might quote for a certain project. To perform the
analysis, artificial neural networks and multivariate adaptive regression splines are used.
The results obtained from both the techniques are compared, and it is found that
multivariate adaptive regression splines are able to predict the cost better than artificial
neural networks.
|
6 |
Bayesian Hierarchical, Semiparametric, and Nonparametric Methods for International New Product Di ffusionHartman, Brian Matthew 2010 August 1900 (has links)
Global marketing managers are keenly interested in being able to predict the sales
of their new products. Understanding how a product is adopted over time allows
the managers to optimally allocate their resources. With the world becoming ever
more global, there are strong and complex interactions between the countries in the
world. My work explores how to describe the relationship between those countries and
determines the best way to leverage that information to improve the sales predictions.
In Chapter II, I describe how diffusion speed has changed over time. The most
recent major study on this topic, by Christophe Van den Bulte, investigated new
product di ffusions in the United States. Van den Bulte notes that a similar study
is needed in the international context, especially in developing countries. Additionally,
his model contains the implicit assumption that the diffusion speed parameter
is constant throughout the life of a product. I model the time component as a nonparametric
function, allowing the speed parameter the
flexibility to change over time.
I find that early in the product's life, the speed parameter is higher than expected.
Additionally, as the Internet has grown in popularity, the speed parameter has increased.
In Chapter III, I examine whether the interactions can be described through
a reference hierarchy in addition to the cross-country word-of-mouth eff ects already
in the literature. I also expand the word-of-mouth e ffect by relating the magnitude
of the e ffect to the distance between the two countries. The current literature only applies that e ffect equally to the n closest countries (forming a neighbor set). This
also leads to an analysis of how to best measure the distance between two countries. I
compare four possible distance measures: distance between the population centroids,
trade
ow, tourism
ow, and cultural similarity. Including the reference hierarchy
improves the predictions by 30 percent over the current best model.
Finally, in Chapter IV, I look more closely at the Bass Diffusion Model. It is
prominently used in the marketing literature and is the base of my analysis in Chapter
III. All of the current formulations include the implicit assumption that all the
regression parameters are equal for each country. One dollar increase in GDP should
have more of an eff ect in a poor country than in a rich country. A Dirichlet process
prior enables me to cluster the countries by their regression coefficients. Incorporating
the distance measures can improve the predictions by 35 percent in some cases.
|
7 |
An Algorithm For The Forward Step Of Adaptive Regression Splines Via Mapping ApproachKartal Koc, Elcin 01 September 2012 (has links) (PDF)
In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a well-known nonparametric regression technique to approximate the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear basis functions which are separated from each other with breaking points (knots) for function estimation. The model estimating function is generated in two stepwise procedures: forward selection and backward elimination. In the first step, a general model including too many basis functions so the knot points are generated / and in the second one, the least contributing basis functions to the overall fit are eliminated. In the conventional adaptive spline procedure, knots are selected from a set of distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid these drawbacks, it is possible to select the knot points from a subset of data points, which leads to data reduction. In this study, a new method (called S-FMARS) is proposed to select the knot points by using a self organizing map-based approach which transforms the original data points to a lower dimensional space. Thus, less number of knot points is enabled to be evaluated for model building in the forward selection of MARS algorithm. The results obtained from
simulated datasets and of six real-world datasets show that the proposed method is time efficient in model construction without degrading the model accuracy and prediction performance. In this study, the proposed approach is implemented to MARS and CMARS methods as an alternative to their forward step to improve them by decreasing their computing time
|
8 |
A Computational Approach To Nonparametric Regression: Bootstrapping Cmars MethodYazici, Ceyda 01 September 2011 (has links) (PDF)
Bootstrapping is a resampling technique which treats the original data set as a population and draws samples from it with replacement. This technique is widely used, especially, in mathematically intractable problems. In this study, it is used to obtain the empirical distributions of the parameters to determine whether they are statistically significant or not in a special case of nonparametric regression, Conic Multivariate Adaptive Regression Splines (CMARS). Here, the CMARS method, which uses conic quadratic optimization, is a modified version of a well-known nonparametric regression model, Multivariate Adaptive Regression Splines (MARS). Although performing better with respect to several criteria, the CMARS model is more complex than that of MARS. To overcome this problem, and to improve the CMARS performance further, three different bootstrapping regression methods, namely, Random-X, Fixed-X and Wild Bootstrap are applied on four data sets with different size and scale. Then, the performances of the models are compared using various criteria including accuracy, precision, complexity, stability, robustness and efficiency. Random-X yields more precise, accurate and less complex models particularly for medium size and medium scale data even though it is the least efficient method.
|
9 |
A comparison of some methods of modeling baseline hazard function in discrete survival modelsMashabela, Mahlageng Retang 20 September 2019 (has links)
MSc (Statistics) / Department of Statistics / The baseline parameter vector in a discrete-time survival model is determined by the number of
time points. The larger the number of the time points, the higher the dimension of the baseline
parameter vector which often leads to biased maximum likelihood estimates. One of the ways
to overcome this problem is to use a simpler parametrization that contains fewer parameters. A
simulation approach was used to compare the accuracy of three variants of penalised regression
spline methods in smoothing the baseline hazard function. Root mean squared error (RMSE)
analysis suggests that generally all the smoothing methods performed better than the model
with a discrete baseline hazard function. No single smoothing method outperformed the other
smoothing methods. These methods were also applied to data on age at rst alcohol intake
in Thohoyandou. The results from real data application suggest that there were no signi cant
di erences amongst the estimated models. Consumption of other drugs, having a parent who
drinks, being a male and having been abused in life are associated with high chances of drinking
alcohol very early in life. / NRF
|
10 |
Semiparametric Varying Coefficient Models for Matched Case-Crossover StudiesOrtega Villa, Ana Maria 23 November 2015 (has links)
Semiparametric modeling is a combination of the parametric and nonparametric models in which some functions follow a known form and some others follow an unknown form. In this dissertation we made contributions to semiparametric modeling for matched case-crossover data.
In matched case-crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. Any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. However, some matching covariates such as time, and/or spatial location often play an important role as an effect modification. Failure to include them makes incorrect statistical estimation, prediction and inference. Hence in this dissertation, we propose several approaches that will allow the inclusion of time and spatial location as well as other effect modifications such as heterogeneous subpopulations among the data.
To address modification due to time, three methods are developed: the first is a parametric approach, the second is a semiparametric penalized approach and the third is a semiparametric Bayesian approach. We demonstrate the advantage of the one stage semiparametric approaches using both a simulation study and an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis with drinking water turbidity.
To address modifications due to time and spatial location, two methods are developed: the first one is a semiparametric spatial-temporal varying coefficient model for a small number of locations. The second method is a semiparametric spatial-temporal varying coefficient model, and is appropriate when the number of locations among the subjects is medium to large. We demonstrate the accuracy of these approaches by using simulation studies, and when appropriate, an epidemiological example of a 1-4 bi-directional case-crossover study.
Finally, to explore further effect modifications by heterogeneous subpopulations among strata we propose a nonparametric Bayesian approach constructed with Dirichlet process priors, which clusters subpopulations and assesses heterogeneity. We demonstrate the accuracy of our approach using a simulation study, as well a an example of a 1-4 bi-directional case-crossover study. / Ph. D.
|
Page generated in 0.0675 seconds