• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 200
  • 158
  • 37
  • 7
  • 7
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 489
  • 489
  • 177
  • 159
  • 154
  • 148
  • 123
  • 70
  • 63
  • 52
  • 47
  • 40
  • 38
  • 38
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Improved effort estimation of software projects based on metrics

Andersson, Veronika, Sjöstedt, Hanna January 2005 (has links)
Saab Ericsson Space AB develops products for space for a predetermined price. Since the price is fixed, it is crucial to have a reliable prediction model to estimate the effort needed to develop the product. In general software effort estimation is difficult, and at the software department this is a problem. By analyzing metrics, collected from former projects, different prediction models are developed to estimate the number of person hours a software project will require. Models for predicting the effort before a project begins is first developed. Only a few variables are known at this state of a project. The models developed are compared to a current model used at the company. Linear regression models improve the estimate error with nine percent units and nonlinear regression models improve the result even more. The model used today is also calibrated to improve its predictions. A principal component regression model is developed as well. Also a model to improve the estimate during an ongoing project is developed. This is a new approach, and comparison with the first estimate is the only evaluation. The result is an improved prediction model. There are several models that perform better than the one used today. In the discussion, positive and negative aspects of the models are debated, leading to the choice of a model, recommended for future use.
162

Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression

Dhavala, Soma Sekhar 2010 December 1900 (has links)
We are concerned with testing for differential expression and consider three different aspects of such testing procedures. First, we develop an exact ANOVA type model for discrete gene expression data, produced by technologies such as a Massively Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE) or other next generation sequencing technologies. We adopt two Bayesian hierarchical models—one parametric and the other semiparametric with a Dirichlet process prior that has the ability to borrow strength across related signatures, where a signature is a specific arrangement of the nucleotides. We utilize the discreteness of the Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using non-parametric approaches, while controlling the false discovery rate. Next, we consider ways to combine expression data from different studies, possibly produced by different technologies resulting in mixed type responses, such as Microarrays and MPSS. Depending on the technology, the expression data can be continuous or discrete and can have different technology dependent noise characteristics. Adding to the difficulty, genes can have an arbitrary correlation structure both within and across studies. Performing several hypothesis tests for differential expression could also lead to false discoveries. We propose to address all the above challenges using a Hierarchical Dirichlet process with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations to latent processes upon which inference is based. Finally, we propose an algorithm for controlling different error measures in a Bayesian multiple testing under generic loss functions, including the widely used uniform loss function. We do not make any specific assumptions about the underlying probability model but require that indicator variables for the individual hypotheses are available as a component of the inference. Given this information, we recast multiple hypothesis testing as a combinatorial optimization problem and in particular, the 0-1 knapsack problem which can be solved efficiently using a variety of algorithms, both approximate and exact in nature.
163

Testing Lack-of-Fit of Generalized Linear Models via Laplace Approximation

Glab, Daniel Laurence 2011 May 1900 (has links)
In this study we develop a new method for testing the null hypothesis that the predictor function in a canonical link regression model has a prescribed linear form. The class of models, which we will refer to as canonical link regression models, constitutes arguably the most important subclass of generalized linear models and includes several of the most popular generalized linear models. In addition to the primary contribution of this study, we will revisit several other tests in the existing literature. The common feature among the proposed test, as well as the existing tests, is that they are all based on orthogonal series estimators and used to detect departures from a null model. Our proposal for a new lack-of-fit test is inspired by the recent contribution of Hart and is based on a Laplace approximation to the posterior probability of the null hypothesis. Despite having a Bayesian construction, the resulting statistic is implemented in a frequentist fashion. The formulation of the statistic is based on characterizing departures from the predictor function in terms of Fourier coefficients, and subsequent testing that all of these coefficients are 0. The resulting test statistic can be characterized as a weighted sum of exponentiated squared Fourier coefficient estimators, whereas the weights depend on user-specified prior probabilities. The prior probabilities provide the investigator the flexibility to examine specific departures from the prescribed model. Alternatively, the use of noninformative priors produces a new omnibus lack-of-fit statistic. We present a thorough numerical study of the proposed test and the various existing orthogonal series-based tests in the context of the logistic regression model. Simulation studies demonstrate that the test statistics under consideration possess desirable power properties against alternatives that have been identified in the existing literature as being important.
164

Parameter Estimation In Generalized Partial Linear Models With Conic Quadratic Programming

Celik, Gul 01 September 2010 (has links) (PDF)
In statistics, regression analysis is a technique, used to understand and model the relationship between a dependent variable and one or more independent variables. Multiple Adaptive Regression Spline (MARS) is a form of regression analysis. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models non-linearities and interactions. MARS is very important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. In our study, we analyzed Generalized Partial Linear Models (GPLMs), which are particular semiparametric models. GPLMs separate input variables into two parts and additively integrates classical linear models with nonlinear model part. In order to smooth this nonparametric part, we use Conic Multiple Adaptive Regression Spline (CMARS), which is a modified form of MARS. MARS is very benefical for high dimensional problems and does not require any particular class of relationship between the regressor variables and outcome variable of interest. This technique offers a great advantage for fitting nonlinear multivariate functions. Also, the contribution of the basis functions can be estimated by MARS, so that both the additive and interaction effects of the regressors are allowed to determine the dependent variable. There are two steps in the MARS algorithm: the forward and backward stepwise algorithms. In the first step, the model is constructed by adding basis functions until a maximum level of complexity is reached. Conversely, in the second step, the backward stepwise algorithm reduces the complexity by throwing the least significant basis functions from the model. In this thesis, we suggest not using backward stepwise algorithm, instead, we employ a Penalized Residual Sum of Squares (PRSS). We construct PRSS for MARS as a Tikhonov Regularization Problem. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and alternative to the concept of the backward stepwise algorithm. Especially, we apply the elegant framework of Conic Quadratic Programming (CQP) an area of convex optimization that is very well-structured, hereby, resembling linear programming and, therefore, permitting the use of interior point methods. At the end of this study, we compare CQP with Tikhonov Regularization problem for two different data sets, which are with and without interaction effects. Moreover, by using two another data sets, we make a comparison between CMARS and two other classification methods which are Infinite Kernel Learning (IKL) and Tikhonov Regularization whose results are obtained from the thesis, which is on progress.
165

Log-linear Rasch-type models for repeated categorical data with a psychobiological application

Hatzinger, Reinhold, Katzenbeisser, Walter January 2008 (has links) (PDF)
The purpose of this paper is to generalize regression models for repeated categorical data based on maximizing a conditional likelihood. Some existing methods, such as those proposed by Duncan (1985), Fischer (1989), and Agresti (1993, and 1997) are special cases of this latent variable approach, used to account for dependencies in clustered observations. The generalization concerns the incorporation of rather general data structures such as subject-specific time-dependent covariates, a variable number of observations per subject and time periods of arbitrary length in order to evaluate treatment effects on a categorical response variable via a linear parameterization. The response may be polytomous, ordinal or dichotomous. The main tool is the log-linear representation of appropriately parameterized Rasch-type models, which can be fitted using standard software, e.g., R. The proposed method is applied to data from a psychiatric study on the evaluation of psychobiological variables in the therapy of depression. The effects of plasma levels of the antidepressant drug Clomipramine and neuroendocrinological variables on the presence or absence of anxiety symptoms in 45 female patients are analyzed. The individual measurements of the time dependent variables were recorded on 2 to 11 occasions. The findings show that certain combinations of the variables investigated are favorable for the treatment outcome. (author´s abstract) / Series: Research Report Series / Department of Statistics and Mathematics
166

Ideology and interests : a hierarchical Bayesian approach to spatial party preferences

Mohanty, Peter Cushner 04 December 2013 (has links)
This paper presents a spatial utility model of support for multiple political parties. The model includes a "valence" term, which I reparameterize to include both party competence and the voters' key sociodemographic concerns. The paper shows how this spatial utility model can be interpreted as a hierarchical model using data from the 2009 European Elections Study. I estimate this model via Bayesian Markov Chain Monte Carlo (MCMC) using a block Gibbs sampler and show that the model can capture broad European-wide trends while allowing for significant amounts of heterogeneity. This approach, however, which assumes a normal dependent variable, is only able to partially reproduce the data generating process. I show that the data generating process can be reproduced more accurately with an ordered probit model. Finally, I discuss trade-offs between parsimony and descriptive richness and other practical challenges that may be encountered when v building models of party support and make recommendations for capturing the best of both approaches. / text
167

Data Augmentation and Dynamic Linear Models

Frühwirth-Schnatter, Sylvia January 1992 (has links) (PDF)
We define a subclass of dynamic linear models with unknown hyperparameters called d-inverse-gamma models. We then approximate the marginal p.d.f.s of the hyperparameter and the state vector by the data augmentation algorithm of Tanner/Wong. We prove that the regularity conditions for convergence hold. A sampling based scheme for practical implementation is discussed. Finally, we illustrate how to obtain an iterative importance sampling estimate of the model likelihood. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
168

Γραμμικά μοντέλα χρονοσειρών και αυτοσυσχέτισης

Γαζή, Σταυρούλα 07 July 2015 (has links)
Ο σκοπός αυτής της μεταπτυχιακής εργασίας είναι διπλός και συγκεκριμένα αφορά στη μελέτη του απλού / γενικευμένου πολλαπλού μοντέλου παλινδρόμησης όταν σε αυτό παραβιάζεται μια από τις συνθήκες των Gauss-Markov και πιο συγκεκριμένα όταν, Cov{ε_i,ε_j }≠0, ∀ i≠j και στην ανάλυση χρονοσειρών. Αρχικά, γίνεται συνοπτική αναφορά στο απλό και στο πολλαπλό γραμμικό μοντέλο παλινδρόμησης, στις ιδιότητες καθώς και στις εκτιμήσεις των συντελεστών παλινδρόμησης. Περιγράφονται οι ιδιότητες των τυχαίων όρων όπως μέση τιμή, διασπορά, συντελεστές συσχέτισης κ.α., εφόσον υπάρχει παραβίαση της ιδιότητας της συνδιασποράς αυτών. Τέλος, περιγράφεται ο έλεγχος για αυτοσυσχέτιση των τυχαίων όρων των Durbin-Watson καθώς και μια ποικιλία διορθωτικών μέτρων με σκοπό την εξάλειψή της. Στο δεύτερο μέρος, αρχικά αναφέρονται βασικές έννοιες της θεωρίας των χρονοσειρών. Στη συνέχεια, γίνεται ανάλυση διαφόρων στάσιμων χρονοσειρών και συγκεκριμένα, ξεκινώντας από το λευκό θόρυβο, παρουσιάζονται οι χρονοσειρές κινητού μέσου (ΜΑ), οι αυτοπαλινδρομικές χρονοσειρές (ΑR), οι χρονοσειρές ARMA, καθώς και η γενική περίπτωση μη στάσιμων χρονοσειρών, των ΑRΙΜΑ χρονοσειρών και παρατίθενται συνοπτικά τα πρώτα στάδια ανάλυσης μιας χρονοσειράς για κάθε μια από τις περιπτώσεις αυτές. Η εργασία αυτή βασίστηκε σε δύο σημαντικά βιβλία διακεκριμένων επιστημόνων, του κ. Γεώργιου Κ. Χρήστου, Εισαγωγή στην Οικονομετρία και στο βιβλίο των John Neter, Michael H. Kutner, Christofer J. Nachtsheim και William Wasserman, Applied Linear Regression Models. / The purpose of this thesis is twofold, namely concerns the study of the simple / generalized multiple regression model when this violated one of the conditions of Gauss-Markov specifically when, Cov {e_i, e_j} ≠ 0, ∀ i ≠ j and time series analysis. Initially, there is a brief reference to the simple and multiple linear regression model, the properties and estimates of regression coefficients. Describe the properties of random terms such as mean, variance, correlation coefficients, etc., if there is a breach of the status of their covariance. Finally, described the test for autocorrelation of random terms of the Durbin-Watson and a variety of corrective measures to eliminate it. In the second part, first mentioned basic concepts of the theory of time series. Then, various stationary time series analyzes and specifically, starting from the white noise, the time series moving average presented (MA), the aftopalindromikes time series (AR) time series ARMA, and the general case of non-stationary time series of ARIMA time series and briefly presents the first analysis steps in a time series for each of these cases. This work was based on two important books of distinguished scientists, Mr. George K. Christou, Introduction to Econometrics, and in the book of John Neter, Michael H. Kutner, Christofer J. Nachtsheim and William Wasserman, Applied Linear Regression Models.
169

MULTIVARIATE MEASURE OF AGREEMENT

Towstopiat, Olga Michael January 1981 (has links)
Reliability issues are always salient as behavioral researchers observe human behavior and classify individuals from criterion-referenced test scores. This has created a need for studies to assess agreement between observers, recording the occurrance of various behaviors, to establish the reliability of their classifications. In addition, there is a need for measuring the consistency of dichotomous and polytomous classifications established from criterion-referenced test scores. The development of several log linear univariate models for measuring agreement has partially met the demand for a probability-based measure of agreement with a directly interpretable meaning. However, multi-variate repeated measures agreement produres are necessary because of the development of complex intrasubject and intersubject research designs. The present investigation developed applications of the log linear, latent class, and weighted least squares procedures for the analysis of multivariate repeated measures designs. These computations tested the model-data fit and calculated the multivariate measure of the magnitude of agreement under the quasi-equiprobability and quasi-independence models. Applications of these computations were illustrated with real and hypothetical observational data. It was demonstrated that employing log linear, latent class, and weighted least squares computations resulted in identical multi-variate model-data fits with equivalent chi-square values. Moreover, the application of these three methodologies also produced identical measures of the degree of agreement at each point in time and for the multivariate average. The multivariate methods that were developed also included procedures for measuring the probability of agreement for a single response classification or subset of classifications from a larger set. In addition, procedures were developed to analyze occurrences of systematic observed disagreement within the multivariate tables. The consistency of dichotomous and polytomous classifications over repeated assessments of the identical examinees was also suggested as a means of conceptualizing criterion-referenced reliability. By applying the univariate and multivariate models described, the reliability of these classifications across repeated testings could be calculated. The procedures utilizing the log linear, latent structure, and weighted least squares concepts for the purpose of measuring agreement have the advantages of (1)yielding a coefficient of agreement that varies between zero and one and measures agreement in terms of the probability that the observers' judgements will agree, as estimated under a quasi-equiprobability or quasi-independence model, (2)correcting for the proportion of "chance" agreement, and (3) providing a directly interpretable coefficient of "no agreement." Thus, these multivariate procedures may be regarded as a more refined psychometric technology for measuring inter-observer agreement and criterion-referenced test reliability.
170

Flexible statistical modeling of deaths by diarrhoea in South Africa.

Mbona, Sizwe Vincent. 17 December 2013 (has links)
The purpose of this study is to investigate and understand data which are grouped into categories. Various statistical methods was studied for categorical binary responses to investigate the causes of death from diarrhoea in South Africa. Data collected included death type, sex, marital status, province of birth, province of death, place of death, province of residence, education status, smoking status and pregnancy status. The objective of this thesis is to investigate which of the above explanatory variables was most affected by diarrhoea in South Africa. To achieve this objective, different sample survey data analysis techniques are investigated. This includes sketching bar graphs and using several statistical methods namely, logistic regression, surveylogistic, generalised linear model, generalised linear mixed model, and generalised additive model. In the selection of the fixed effects, a bar graph is applied to the response variable individual profile graphs. A logistic regression model is used to identify which of the explanatory variables are more affected by diarrhoea. Statistical applications are conducted in SAS (Statistical Analysis Software). Hosmer and Lemeshow (2000) propose a statistic that they show, through simulation, is distributed as chi‐square when there is no replication in any of the subpopulations. Due to the similarity of the Hosmer and Lemeshow test for logistic regression, Parzen and Lipsitz (1999) suggest using 10 risk score groups. Nevertheless, based on simulation results, May and Hosmer (2004) show that, for all samples or samples with a large percentage of censored observations, the test rejects the null hypothesis too often. They suggest that the number of groups be chosen such that G=integer of {maximum of 12 and minimum of 10}. Lemeshow et al. (2004) state that the observations are firstly sorted in increasing order of their estimated event probability. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2013.

Page generated in 0.0554 seconds