• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 3
  • 1
  • Tagged with
  • 15
  • 15
  • 11
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Bayesian regression and discrimination with many variables

Chang, Kai-Ming January 2002 (has links)
No description available.
2

Generalizing the multivariate normality assumption in the simulation of dependencies in transportation systems

Ng, Man Wo 22 November 2010 (has links)
By far the most popular method to account for dependencies in the transportation network analysis literature is the use of the multivariate normal (MVN) distribution. While in certain cases there is some theoretical underpinning for the MVN assumption, in others there is none. This can lead to misleading results: results do not only depend on whether dependence is modeled, but also how dependence is modeled. When assuming the MVN distribution, one is limiting oneself to a specific set of dependency structures, which can substantially limit validity of results. In this report an existing, more flexible, correlation-based approach (where just marginal distributions and their correlations are specified) is proposed, and it is demonstrated that, in simulation studies, such an approach is a generalization of the MVN assumption. The need for such generalization is particularly critical in the transportation network modeling literature, where oftentimes there exists no or insufficient data to estimate probability distributions, so that sensitivity analyses assuming different dependence structures could be extremely valuable. However, the proposed method has its own drawbacks. For example, it is again not able to exhaust all possible dependence forms and it relies on some not-so-known properties of the correlation coefficient. / text
3

Bayesian Logistic Regression Model with Integrated Multivariate Normal Approximation for Big Data

Fu, Shuting 28 April 2016 (has links)
The analysis of big data is of great interest today, and this comes with challenges of improving precision and efficiency in estimation and prediction. We study binary data with covariates from numerous small areas, where direct estimation is not reliable, and there is a need to borrow strength from the ensemble. This is generally done using Bayesian logistic regression, but because there are numerous small areas, the exact computation for the logistic regression model becomes challenging. Therefore, we develop an integrated multivariate normal approximation (IMNA) method for binary data with covariates within the Bayesian paradigm, and this procedure is assisted by the empirical logistic transform. Our main goal is to provide the theory of IMNA and to show that it is many times faster than the exact logistic regression method with almost the same accuracy. We apply the IMNA method to the health status binary data (excellent health or otherwise) from the Nepal Living Standards Survey with more than 60,000 households (small areas). We estimate the proportion of Nepalese in excellent health condition for each household. For these data IMNA gives estimates of the household proportions as precise as those from the logistic regression model and it is more than fifty times faster (20 seconds versus 1,066 seconds), and clearly this gain is transferable to bigger data problems.
4

Modelo de calibração ultraestrutural / Ultrastructural calibration model

Talarico, Alina Marcondes 23 January 2014 (has links)
Os programas de Ensaios de Prociência (EP) são utilizados pela sociedade para avaliar a competência e a confiabilidade de laboratórios na execução de medições específicas. Atualmente, diversos grupos de EP foram estabelecidos pelo INMETRO, entre estes, o grupo de testes de motores. Cada grupo é formado por diversos laboratórios que medem o mesmo artefato e suas medições são comparadas através de métodos estatísticos. O grupo de motores escolheu um motor gasolina 1.0, gentilmente cedido pela GM Powertrain, como artefato. A potência do artefato foi medida em 10 pontos de rotação por 6 laboratórios. Aqui, motivados por este conjunto de dados, estendemos o modelo de calibração comparativa de Barnett (1969) para avaliar a compatibilidade dos laboratórios considerando a distribuição t de Student e apresentamos os resultados obtidos das aplicações e simulações a este conjunto de dados / Proficiency Testing (PT) programs are used by society to assess the competence and the reliability in laboratories execution of specific measurements. Nowadays many PT groups were established by INMETRO, including the motor\'s test group. Each group is formed by laboratories measuring the same artifact and their measurements are compared through statistic methods. The motor\'s group chose a gasoline engine 1.0, kindly provided by GM as an artifact. The artifact\'s power was measured at ten points of rotation by 6 laboratories. Here, motivated by this set data, we extend the Barnet comparative calibration model (1969) to assess the compatibility of the laboratories considering the Student-t distribution and show the results obtained from application and simulation of this set data
5

Modelo de calibração ultraestrutural / Ultrastructural calibration model

Alina Marcondes Talarico 23 January 2014 (has links)
Os programas de Ensaios de Prociência (EP) são utilizados pela sociedade para avaliar a competência e a confiabilidade de laboratórios na execução de medições específicas. Atualmente, diversos grupos de EP foram estabelecidos pelo INMETRO, entre estes, o grupo de testes de motores. Cada grupo é formado por diversos laboratórios que medem o mesmo artefato e suas medições são comparadas através de métodos estatísticos. O grupo de motores escolheu um motor gasolina 1.0, gentilmente cedido pela GM Powertrain, como artefato. A potência do artefato foi medida em 10 pontos de rotação por 6 laboratórios. Aqui, motivados por este conjunto de dados, estendemos o modelo de calibração comparativa de Barnett (1969) para avaliar a compatibilidade dos laboratórios considerando a distribuição t de Student e apresentamos os resultados obtidos das aplicações e simulações a este conjunto de dados / Proficiency Testing (PT) programs are used by society to assess the competence and the reliability in laboratories execution of specific measurements. Nowadays many PT groups were established by INMETRO, including the motor\'s test group. Each group is formed by laboratories measuring the same artifact and their measurements are compared through statistic methods. The motor\'s group chose a gasoline engine 1.0, kindly provided by GM as an artifact. The artifact\'s power was measured at ten points of rotation by 6 laboratories. Here, motivated by this set data, we extend the Barnet comparative calibration model (1969) to assess the compatibility of the laboratories considering the Student-t distribution and show the results obtained from application and simulation of this set data
6

Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution

Karlsson, Emil January 2014 (has links)
The problem of estimating mean and covariances of a multivariate normal distributedrandom vector has been studied in many forms. This thesis focuses on the estimatorsproposed in [15] for a banded covariance structure with m-dependence. It presents theprevious results of the estimator and rewrites the estimator when m = 1, thus makingit easier to analyze. This leads to an adjustment, and a proposition for an unbiasedestimator can be presented. A new and easier proof of consistency is then presented.This theory is later generalized into a general linear model where the correspondingtheorems and propositions are made to establish unbiasedness and consistency. In thelast chapter some simulations with the previous and new estimator verifies that thetheoretical results indeed makes an impact.
7

Imputation techniques for non-ordered categorical missing data

Karangwa, Innocent January 2016 (has links)
Philosophiae Doctor - PhD / Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.
8

An Investigation into Classification of High Dimensional Frequency Data

McGraw, John M. 25 October 2001 (has links)
We desire an algorithm to classify a physical object in ``real-time" using an easily portable probing device. The probe excites a given object at frequencies from 100 MHz up to 800 MHz at intervals of 0.5 MHz. Thus the data used for classification is the 1400-component vector of these frequency responses. The Interdisciplinary Center for Applied Mathematics (ICAM) was asked to help develop an algorithm and executable computer code for the probing device to use in its classification analysis. Due to these and other requirements, all work had to be done in Matlab. Hence a significant portion of the effort was spent in writing and testing applicable Matlab code which incorporated the various statistical techniques implemented. We offer three approaches to classification: maximum log-likelihood estimates, correlation coefficients, and confidence bands. Related work included considering ways to recover and exploit certain symmetry characteristics of the objects (using the response data). Present investigations are not entirely conclusive, but the correlation coefficient classifier seems to produce reasonable and consistent results. All three methods currently require the evaluation of the full 1400-component vector. It has been suggested that unknown portions of the vectors may include extraneous and misleading information, or information common to all classes. Identifying and removing the respective components may be beneficial to classification regardless of method. Another advantage of dimension reduction should be a strengthening of mean and covariance estimates. / Master of Science
9

A novel approach to modeling and predicting crash frequency at rural intersections by crash type and injury severity level

Deng, Jun, active 2013 24 March 2014 (has links)
Safety at intersections is of significant interest to transportation professionals due to the large number of possible conflicts that occur at those locations. In particular, rural intersections have been recognized as one of the most hazardous locations on roads. However, most models of crash frequency at rural intersections, and road segments in general, do not differentiate between crash type (such as angle, rear-end or sideswipe) and injury severity (such as fatal injury, non-fatal injury, possible injury or property damage only). Thus, there is a need to be able to identify the differential impacts of intersection-specific and other variables on crash types and severity levels. This thesis builds upon the work of Bhat et al., (2013b) to formulate and apply a novel approach for the joint modeling of crash frequency and combinations of crash type and injury severity. The proposed framework explicitly links a count data model (to model crash frequency) with a discrete choice model (to model combinations of crash type and injury severity), and uses a multinomial probit kernel for the discrete choice model and introduces unobserved heterogeneity in both the crash frequency model and the discrete choice model, while also accommodates excess of zeros. The results show that the type of traffic control and the number of entering roads are the most important determinants of crash counts and crash type/injury severity, and the results from our analysis underscore the value of our proposed model for data fit purposes as well as to accurately estimate variable effects. / text
10

Computation of High-Dimensional Multivariate Normal and Student-t Probabilities Based on Matrix Compression Schemes

Cao, Jian 22 April 2020 (has links)
The first half of the thesis focuses on the computation of high-dimensional multivariate normal (MVN) and multivariate Student-t (MVT) probabilities. Chapter 2 generalizes the bivariate conditioning method to a d-dimensional conditioning method and combines it with a hierarchical representation of the n × n covariance matrix. The resulting two-level hierarchical-block conditioning method requires Monte Carlo simulations to be performed only in d dimensions, with d ≪ n, and allows the dominant complexity term of the algorithm to be O(n log n). Chapter 3 improves the block reordering scheme from Chapter 2 and integrates it into the Quasi-Monte Carlo simulation under the tile-low-rank representation of the covariance matrix. Simulations up to dimension 65,536 suggest that this method can improve the run time by one order of magnitude compared with the hierarchical Monte Carlo method. The second half of the thesis discusses a novel matrix compression scheme with Kronecker products, an R package that implements the methods described in Chapter 3, and an application study with the probit Gaussian random field. Chapter 4 studies the potential of using the sum of Kronecker products (SKP) as a compressed covariance matrix representation. Experiments show that this new SKP representation can save the memory footprint by one order of magnitude compared with the hierarchical representation for covariance matrices from large grids and the Cholesky factorization in one million dimensions can be achieved within 600 seconds. In Chapter 5, an R package is introduced that implements the methods in Chapter 3 and show how the package improves the accuracy of the computed excursion sets. Chapter 6 derives the posterior properties of the probit Gaussian random field, based on which model selection and posterior prediction are performed. With the tlrmvnmvt package, the computation becomes feasible in tens of thousands of dimensions, where the prediction errors are significantly reduced.

Page generated in 0.0601 seconds