361 |
Evaluating Model-based Trees in PracticeZeileis, Achim, Hothorn, Torsten, Hornik, Kurt January 2006 (has links) (PDF)
A recently suggested algorithm for recursive partitioning of statistical models (Zeileis, Hothorn and Hornik, 2005), such as models estimated by maximum likelihood or least squares, is evaluated in practice. The general algorithm is applied to linear regression, logisitic regression and survival regression and applied to economical and medical regression problems. Furthermore, its performance with respect to prediction quality and model complexity is compared in a benchmark study with a large collection of other tree-based algorithms showing that the algorithm yields interpretable trees, competitive with previously suggested approaches. / Series: Research Report Series / Department of Statistics and Mathematics
|
362 |
Minimum description length, regularisation and multi-modal dataVan der Rest, John C. January 1995 (has links)
Conventional feed forward Neural Networks have used the sum-of-squares cost function for training. A new cost function is presented here with a description length interpretation based on Rissanen's Minimum Description Length principle. It is a heuristic that has a rough interpretation as the number of data points fit by the model. Not concerned with finding optimal descriptions, the cost function prefers to form minimum descriptions in a naive way for computational convenience. The cost function is called the Naive Description Length cost function. Finding minimum description models will be shown to be closely related to the identification of clusters in the data. As a consequence the minimum of this cost function approximates the most probable mode of the data rather than the sum-of-squares cost function that approximates the mean. The new cost function is shown to provide information about the structure of the data. This is done by inspecting the dependence of the error to the amount of regularisation. This structure provides a method of selecting regularisation parameters as an alternative or supplement to Bayesian methods. The new cost function is tested on a number of multi-valued problems such as a simple inverse kinematics problem. It is also tested on a number of classification and regression problems. The mode-seeking property of this cost function is shown to improve prediction in time series problems. Description length principles are used in a similar fashion to derive a regulariser to control network complexity.
|
363 |
Approximating differentiable relationships between delay embedded dynamical systems with radial basis functionsPotts, Michael Alan Sherred January 1996 (has links)
This thesis is about the study of relationships between experimental dynamical systems. The basic approach is to fit radial basis function maps between time delay embeddings of manifolds. We have shown that under certain conditions these maps are generically diffeomorphisms, and can be analysed to determine whether or not the manifolds in question are diffeomorphically related to each other. If not, a study of the distribution of errors may provide information about the lack of equivalence between the two. The method has applications wherever two or more sensors are used to measure a single system, or where a single sensor can respond on more than one time scale: their respective time series can be tested to determine whether or not they are coupled, and to what degree. One application which we have explored is the determination of a minimum embedding dimension for dynamical system reconstruction. In this special case the diffeomorphism in question is closely related to the predictor for the time series itself. Linear transformations of delay embedded manifolds can also be shown to have nonlinear inverses under the right conditions, and we have used radial basis functions to approximate these inverse maps in a variety of contexts. This method is particularly useful when the linear transformation corresponds to the delay embedding of a finite impulse response filtered time series. One application of fitting an inverse to this linear map is the detection of periodic orbits in chaotic attractors, using suitably tuned filters. This method has also been used to separate signals with known bandwidths from deterministic noise, by tuning a filter to stop the signal and then recovering the chaos with the nonlinear inverse. The method may have applications to the cancellation of noise generated by mechanical or electrical systems. In the course of this research a sophisticated piece of software has been developed. The program allows the construction of a hierarchy of delay embeddings from scalar and multi-valued time series. The embedded objects can be analysed graphically, and radial basis function maps can be fitted between them asynchronously, in parallel, on a multi-processor machine. In addition to a graphical user interface, the program can be driven by a batch mode command language, incorporating the concept of parallel and sequential instruction groups and enabling complex sequences of experiments to be performed in parallel in a resource-efficient manner.
|
364 |
Numerical analysis of pile test data from instrumented large diameter bored piles formed in Keuper marl (Mercia mudstone)Omer, J. R. January 1998 (has links)
No description available.
|
365 |
Evaluation and application of the Bank Assessment for Non-Point Source Consequences of Sediment (BANCS) model developed to predict annual streambank erosion ratesBigham, Kari A. January 1900 (has links)
Master of Science / Department of Biological & Agricultural Engineering / Trisha L. Moore / Excess sediment is a leading cause of stream impairment in the United States, resulting in poor water quality, sedimentation of downstream waterbodies, and damage to aquatic ecosystems. Numerous case studies have found that accelerated bank erosion can be the main contributor of sediment in impaired streams. An empirically-derived "Bank Assessment for Non-Point Source Consequences of Sediment" (BANCS) model can be developed for a specific hydrophysiographic region to rapidly estimate sediment yield from streambank erosion, based on both physical and observational measurements of a streambank. This study aims to address model criticisms by (1) evaluating the model’s repeatability and sensitivity and (2) examining the developmental process of a BANCS model by attempting to create an annual streambank erosion rate prediction curve for the Central Great Plains ecoregion.
To conduct the repeatability and sensitivity analysis of the BANCS model, ten stream professionals with experience utilizing the model individually evaluated the same six streambanks twice in the summer of 2015. To determine the model’s repeatability, individual streambank evaluations, as well as groups of evaluations based on level of Rosgen course training, were compared utilizing Kendall’s coefficient of concordance and a linear model with a randomized complete block design. Additionally, a one-at-a-time design approach was implemented to test sensitivity of model inputs. Statistical analysis of individual streambank evaluations suggests that the implementation of the BANCS model may not be repeatable. This may be due to highly sensitive model inputs, such as streambank height and near-bank stress method selection, and/or highly uncertain model inputs, such as bank material. Furthermore, it was found that higher level of training may improve model implementation precision.
In addition to the repeatability and sensitivity analysis, the BANCS model developmental process was examined through the creation of a provisional streambank erosion rate prediction curve for the Central Great Plains ecoregion. Streambank erosion data was collected sporadically from 2006 to 2016 from eighteen study banks within the sediment-impaired Little Arkansas River watershed of south-central Kansas. Model fit was observed to follow the same trends, but with greater dispersion, when compared to other created models throughout the United States and eastern India. This increase in variability could be due to (1) obtaining streambank erosion data sporadically over a 10-year period with variable streamflows, (2) BEHI/NBS ratings obtained only once in recent years, masking the spatiotemporal variability of streambank erosion, (3) lack of observations, and (4) use of both bank profiles and bank pin measurements to calculate average retreat rates.
Based on the results of this study, a detailed model creation procedure was suggested that addresses several model limitations and criticisms. Recommendations provided in the methodology include (1) more accurate measurement of sensitive/uncertain BEHI/NBS parameters, (2) multiple assessments by trained professionals to obtain accurate and precise BEHI/NBS ratings, (3) the use of repeated bank profiles to calculate bank erosion rates, and (4) the development of flow-dependent curves based on annually assessed study banks. Subsequent studies should incorporate these findings to improve upon the suggested methodology and increase the predictive power of future BANCS models.
|
366 |
Quality Market: Design and Field Study of Prediction Market for Software Quality ControlKrishnamurthy, Janaki 01 January 2010 (has links)
Given the increasing competition in the software industry and the critical consequences of software errors, it has become important for companies to achieve high levels of software quality. While cost reduction and timeliness of projects continue to be important measures, software companies are placing increasing attention on identifying the user needs and better defining software quality from a customer perspective. Software quality goes beyond just correcting the defects that arise from any deviations from the functional requirements. System engineers also have to focus on a large number of quality requirements such as security, availability, reliability, maintainability, performance and temporal correctness requirements. The fulfillment of these run-time observable quality requirements is important for customer satisfaction and project success.
Generating early forecasts of potential quality problems can have significant benefits to quality improvement. One approach to better software quality is to improve the overall development cycle in order to prevent the introduction of defects and improve run-time quality factors. Many methods and techniques are available which can be used to forecast quality of an ongoing project such as statistical models, opinion polls, survey methods etc. These methods have known strengths and weaknesses and accurate forecasting is still a major issue.
This research utilized a novel approach using prediction markets, which has proved useful in a variety of situations. In a prediction market for software quality, individual estimates from diverse project stakeholders such as project managers, developers, testers, and users were collected at various points in time during the project. Analogous to the financial futures markets, a security (or contract) was defined that represents the quality requirements and various stakeholders traded the securities using the prevailing market price and their private information. The equilibrium market price represents the best aggregate of diverse opinions. Among many software quality factors, this research focused on predicting the software correctness.
The goal of the study was to evaluate if a suitably designed prediction market would generate a more accurate estimate of software quality than a survey method which polls subjects. Data were collected using a live software project in three stages: viz., the requirements phase, an early release phase and a final release phase. The efficacy of the market was tested with results from prediction markets by (i) comparing the market outcomes to final project outcome, and (ii) by comparing market outcomes to results of opinion poll.
Analysis of data suggests that predictions generated using the prediction market are significantly different from those generated using polls at early release and final release stages. The prediction market estimates were also closer to the actual probability estimates for quality compared to the polls. Overall, the results suggest that suitably designed prediction markets provide better forecasts of potential quality problems than polls.
|
367 |
An evaluation of the efficiency of personal information as embodied in a personal history record as a means of predicting academic success at the college freshman level.Marriott, John C January 1954 (has links)
Thesis (Ed.D.)--Boston University.
|
368 |
Data Mining for Car Insurance Claims PredictionHuangfu, Dan 27 April 2015 (has links)
A key challenge for the insurance industry is to charge each customer an appropriate price for the risk they represent. Risk varies widely from customer to customer, and a deep understanding of different risk factors helps predict the likelihood and cost of insurance claims. The goal of this project is to see how well various statistical methods perform in predicting bodily injury liability Insurance claim payments based on the characteristics of the insured customer’s vehicles for this particular dataset from Allstate Insurance Company.We tried several statistical methods, including logistic regression, Tweedie’s compound gamma-Poisson model, principal component analysis (PCA), response averaging, and regression and decision trees. From all the models we tried, PCA combined with a with a Regression Tree produced the best results. This is somewhat surprising given the widespread use of the Tweedie model for insurance claim prediction problems.
|
369 |
Developmental hip dysplasia predicting outcome and implications for secondary proceduresFirth, Gregory Bodley 28 April 2009 (has links)
ABSTRACT
A group of 133 hips with developmental dysplasia of the hip (DDH) are reviewed in the form of a clinical audit. The aim of the study is to determine the relevance of measuring the ossific nucleus centre edge angle (ONCEA) to determine if this measurement can be used to predict the final outcome and the need for a secondary procedure at an earlier age than currently determined. The ONCEA is defined as an approximation of the lowest centre edge angle within six months of removal of the Batchelor POP, following reduction (mean age 24.1 months). It is measured earlier than the centre edge angle (CEA), which is generally used from the age of five years.
The ONCEA was divided into three groups:
- Reduced (>=10°) – Group A
- Mild subluxation (-9° to 9°) – Group B
- Severe subluxation (<=-10°) – Group C
The significance of the ONCEA was confirmed using the ONCEA/AI ratio, which was also divided into three groups:
- Reduced (>0.5) – Group A
- Mild subluxation (0 to 0.5) – Group B
- Severe subluxation (<0) – Group C
Outcome was assessed radiologically by way of the Severin score: In group C there were only 1/13 hips (8%) with an excellent result, in group B there were 20/44 hips (45%) with an excellent result and in group A there were 39/76 hips (51%) with an excellent result. Using Fisher’s exact test, a statistically significant association was shown between each group and subsequent outcome (p=0.001). A significant result was also shown in a comparison of the three ONCEA groups using the McKay classification (a clinical outcome measurement).
The ONCEA/AI ratio was also used to include the degree of acetabular coverage. It had similar statistically significant results as described for the above ONCEA results, thus confirming the findings.
In conclusion, the ONCEA or ONCEA/AI ratio can be used at an early age (within six months following removal of POP after reduction, at a mean of 18 months of age) for two purposes:
1. To prognosticate the medium and long-term outcome of the patient.
2. To enable the clinician to determine whether a secondary procedure should be performed at an earlier age than usual. A prospective study will be necessary to confirm this.
|
370 |
Causality, Prediction, and Replicability in Applied Statistics: Advanced Models and PracticesPütz, Peter 03 May 2019 (has links)
No description available.
|
Page generated in 0.0286 seconds