• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 118
  • 28
  • 23
  • 19
  • 3
  • 2
  • 2
  • Tagged with
  • 257
  • 257
  • 42
  • 38
  • 32
  • 31
  • 31
  • 30
  • 30
  • 27
  • 26
  • 24
  • 24
  • 24
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Dynamic bayesian statistical models for the estimation of the origin-destination matrix / Dynamic bayesian statistical models for the estimation of the origin-destination matrix / Dynamic bayesian statistical models for the estimation of the origin-destination matrix

Anselmo Ramalho Pitombeira Neto 29 June 2015 (has links)
In transportation planning, one of the first steps is to estimate the travel demand. A product of the estimation process is the so-called origin-destination matrix (OD matrix), whose entries correspond to the number of trips between pairs of zones in a geographic region in a reference time period. Traditionally, the OD matrix has been estimated through direct methods, such as home-based surveys, road-side interviews and license plate automatic recognition. These direct methods require large samples to achieve a target statistical error, which may be technically or economically infeasible. Alternatively, one can use a statistical model to indirectly estimate the OD matrix from observed traffic volumes on links of the transportation network. The first estimation models proposed in the literature assume that traffic volumes in a sequence of days are independent and identically distributed samples of a static probability distribution. Moreover, static estimation models do not allow for variations in mean OD flows or non-constant variability over time. In contrast, day-to-day dynamic models are in theory more capable of capturing underlying changes of system parameters which are only indirectly observed through variations in traffic volumes. Even so, there is still a dearth of statistical models in the literature which account for the day-today dynamic evolution of transportation systems. In this thesis, our objective is to assess the potential gains and limitations of day-to-day dynamic models for the estimation of the OD matrix based on link volumes. First, we review the main static and dynamic models available in the literature. We then describe our proposed day-to-day dynamic Bayesian model based on the theory of linear dynamic models. The proposed model is tested by means of computational experiments and compared with a static estimation model and with the generalized least squares (GLS) model. The results show some advantage in favor of dynamic models in informative scenarios, while in non-informative scenarios the performance of the models were equivalent. The experiments also indicate a significant dependence of the estimation errors on the assignment matrices. / In transportation planning, one of the first steps is to estimate the travel demand. A product of the estimation process is the so-called origin-destination matrix (OD matrix), whose entries correspond to the number of trips between pairs of zones in a geographic region in a reference time period. Traditionally, the OD matrix has been estimated through direct methods, such as home-based surveys, road-side interviews and license plate automatic recognition. These direct methods require large samples to achieve a target statistical error, which may be technically or economically infeasible. Alternatively, one can use a statistical model to indirectly estimate the OD matrix from observed traffic volumes on links of the transportation network. The first estimation models proposed in the literature assume that traffic volumes in a sequence of days are independent and identically distributed samples of a static probability distribution. Moreover, static estimation models do not allow for variations in mean OD flows or non-constant variability over time. In contrast, day-to-day dynamic models are in theory more capable of capturing underlying changes of system parameters which are only indirectly observed through variations in traffic volumes. Even so, there is still a dearth of statistical models in the literature which account for the day-today dynamic evolution of transportation systems. In this thesis, our objective is to assess the potential gains and limitations of day-to-day dynamic models for the estimation of the OD matrix based on link volumes. First, we review the main static and dynamic models available in the literature. We then describe our proposed day-to-day dynamic Bayesian model based on the theory of linear dynamic models. The proposed model is tested by means of computational experiments and compared with a static estimation model and with the generalized least squares (GLS) model. The results show some advantage in favor of dynamic models in informative scenarios, while in non-informative scenarios the performance of the models were equivalent. The experiments also indicate a significant dependence of the estimation errors on the assignment matrices. / In transportation planning, one of the first steps is to estimate the travel demand. A product of the estimation process is the so-called origin-destination matrix (OD matrix), whose entries correspond to the number of trips between pairs of zones in a geographic region in a reference time period. Traditionally, the OD matrix has been estimated through direct methods, such as home-based surveys, road-side interviews and license plate automatic recognition. These direct methods require large samples to achieve a target statistical error, which may be technically or economically infeasible. Alternatively, one can use a statistical model to indirectly estimate the OD matrix from observed traffic volumes on links of the transportation network. The first estimation models proposed in the literature assume that traffic volumes in a sequence of days are independent and identically distributed samples of a static probability distribution. Moreover, static estimation models do not allow for variations in mean OD flows or non-constant variability over time. In contrast, day-to-day dynamic models are in theory more capable of capturing underlying changes of system parameters which are only indirectly observed through variations in traffic volumes. Even so, there is still a dearth of statistical models in the literature which account for the day-today dynamic evolution of transportation systems. In this thesis, our objective is to assess the potential gains and limitations of day-to-day dynamic models for the estimation of the OD matrix based on link volumes. First, we review the main static and dynamic models available in the literature. We then describe our proposed day-to-day dynamic Bayesian model based on the theory of linear dynamic models. The proposed model is tested by means of computational experiments and compared with a static estimation model and with the generalized least squares (GLS) model. The results show some advantage in favor of dynamic models in informative scenarios, while in non-informative scenarios the performance of the models were equivalent. The experiments also indicate a significant dependence of the estimation errors on the assignment matrices.
22

Bayesian Mixture Modeling Approaches for Intermediate Variables and Causal Inference

Schwartz, Scott Lee January 2010 (has links)
<p>This thesis examines causal inference related topics involving intermediate variables, and uses Bayesian methodologies to advance analysis capabilities in these areas. First, joint modeling of outcome variables with intermediate variables is considered in the context of birthweight and censored gestational age analyses. The proposed methodology provides improved inference capabilities for birthweight and gestational age, avoids post-treatment selection bias problems associated with conditional on gestational age analyses, and appropriately assesses the uncertainty associated with censored gestational age. Second, principal stratification methodology for settings where causal inference analysis requires appropriate adjustment of intermediate variables is extended to observational settings with binary treatments and binary intermediate variables. This is done by uncovering the structural pathways of unmeasured confounding affecting principal stratification analysis and directly incorporating them into a model based sensitivity analysis methodology. Demonstration focuses on a study of the efficacy of influenza vaccination in elderly populations. Third, flexibility, interpretability, and capability of principal stratification analyses for continuous intermediate variables are improved by replacing the current fully parametric methodologies with semiparametric Bayesian alternatives. This presentation is one of the first uses of nonparametric techniques in causal inference analysis,</p><p>and opens a connection between these two fields. Demonstration focuses on two studies, one involving a cholesterol reduction drug, and one examine the effect of physical activity on cardiovascular disease as it relates to body mass index.</p> / Dissertation
23

Bayesian Nonparametric Methods for Protein Structure Prediction

Lennox, Kristin Patricia 2010 August 1900 (has links)
The protein structure prediction problem consists of determining a protein’s three-dimensional structure from the underlying sequence of amino acids. A standard approach for predicting such structures is to conduct a stochastic search of conformation space in an attempt to find a conformation that optimizes a scoring function. For one subclass of prediction protocols, called template-based modeling, a new protein is suspected to be structurally similar to other proteins with known structure. The solved related proteins may be used to guide the search of protein structure space. There are many potential applications for statistics in this area, ranging from the development of structure scores to improving search algorithms. This dissertation focuses on strategies for improving structure predictions by incorporating information about closely related “template” protein structures into searches of protein conformation space. This is accomplished by generating density estimates on conformation space via various simplifications of structure models. By concentrating a search for good structure conformations in areas that are inhabited by similar proteins, we improve the efficiency of our search and increase the chances of finding a low-energy structure. In the course of addressing this structural biology problem, we present a number of advances to the field of Bayesian nonparametric density estimation. We first develop a method for density estimation with bivariate angular data that has applications to characterizing protein backbone conformation space. We then extend this model to account for multiple angle pairs, thereby addressing the problem of modeling protein regions instead of single sequence positions. In the course of this analysis we incorporate an informative prior into our nonparametric density estimate and find that this significantly improves performance for protein loop prediction. The final piece of our structure prediction strategy is to connect side-chain locations to our torsion angle representation of the protein backbone. We accomplish this by using a Bayesian nonparametric model for dependence that can link together two or more multivariate marginals distributions. In addition to its application for our angular-linear data distribution, this dependence model can serve as an alternative to nonparametric copula methods.
24

Learning in integrated optimization models of climate change and economy

Shayegh, Soheil 21 September 2015 (has links)
Integrated assessment models are powerful tools for providing insight into the interaction between the economy and climate change over a long time horizon. However, knowledge of climate parameters and their behavior under extreme circumstances of global warming is still an active area of research. In this thesis we incorporated the uncertainty in one of the key parameters of climate change, climate sensitivity, into an integrated assessment model and showed how this affects the choice of optimal policies and actions. We constructed a new, multi-step-ahead approximate dynamic programing (ADP) algorithm to study the effects of the stochastic nature of climate parameters. We considered the effect of stochastic extreme events in climate change (tipping points) with large economic loss. The risk of an extreme event drives tougher GHG reduction actions in the near term. On the other hand, the optimal policies in post-tipping point stages are similar to or below the deterministic optimal policies. Once the tipping point occurs, the ensuing optimal actions tend toward more moderate policies. Previous studies have shown the impacts of economic and climate shocks on the optimal abatement policies but did not address the correlation among uncertain parameters. With uncertain climate sensitivity, the risk of extreme events is linked to the variations in climate sensitivity distribution. We developed a novel Bayesian framework to endogenously interrelate the two stochastic parameters. The results in this case are clustered around the pre-tipping point optimal policies of the deterministic climate sensitivity model. Tougher actions are more frequent as there is more uncertainty in likelihood of extreme events in the near future. This affects the optimal policies in post-tipping point states as well, as they tend to utilize more conservative actions. As we proceed in time toward the future, the (binary) status of the climate will be observed and the prior distribution of the climate sensitivity parameter will be updated. The cost and climate tradeoffs of new technologies are key to decisions in climate policy. Here we focus on electricity generation industry and contrast the extremes in electricity generation choices: making choices on new generation facilities based on cost only and in the absence of any climate policy, versus making choices based on climate impacts only regardless of the generation costs. Taking the expected drop in cost as experience grows into account when selecting the portfolio of generation, on a pure cost-minimization basis, renewable technologies displace coal and natural gas within two decades even when climate damage is not considered in the choice of technologies. This is the natural gas as a bridge fuel scenario, and technology advancement to bring down the cost of renewables requires some commitment to renewables generation in the near term. Adopting the objective of minimizing climate damage, essentially moving immediately to low greenhouse gas generation technologies, results in faster cost reduction of new technologies and may result in different technologies becoming dominant in global electricity generation. Thus today’s choices for new electricity generation by individual countries and utilities have implications not only for their direct costs and the global climate, but also for the future costs and availability of emerging electricity generation options.
25

Statistical Characterization of Protein Ensembles

Fisher, Charles January 2012 (has links)
Conformational ensembles are models of proteins that capture variations in conformation that result from thermal fluctuations. Ensemble based models are important tools for studying Intrinsically Disordered Proteins (IDPs), which adopt a heterogeneous set of conformations in solution. In order to construct an ensemble that provides an accurate model for a protein, one must identify a set of conformations, and their relative stabilities, that agree with experimental data. Inferring the characteristics of an ensemble for an IDP is a problem plagued by degeneracy; that is, one can typically construct many different ensembles that agree with any given set of experimental measurements. In light of this problem, this thesis will introduce three tools for characterizing ensembles: (1) an algorithm for modeling ensembles that provides estimates for the uncertainty in the resulting model, (2) a fast algorithm for constructing ensembles for large or complex IDPs and (3) a measure of the degree of disorder in an ensemble. Our hypothesis is that a protein can be accurately modeled as an ensemble only when the degeneracy of the model is appropriately accounted for. We demonstrate these methods by constructing ensembles for K18 tau protein, \(\alpha\)-synuclein and amyloid beta - IDPs that are implicated in the pathogenesis of Alzheimer's and Parkinson's diseases.
26

Accurate Surveillance of Diabetes Mellitus in Nova Scotia within the General Population and the Five First Nations of Cape Breton

Clark, Roderick 03 October 2011 (has links)
Administrative data is one of the most commonly used data sources for diagnosed diabetes surveillance within Canada. Despite their widespread use, administrative case definitions have not been validated in many minority populations on which they are commonly used. Additionally, previous validation work has not evaluated the effect of conditional covariance between data sources, which has been widely shown to significantly bias parameter (sensitivity, specificity, and prevalence) estimation. Using administrative data and data sources which contained gold standard cases of diabetes, this thesis examined (1) the validity of commonly used administrative case definitions for identifying cases of diagnosed diabetes within an Aboriginal population at the sub-provincial level, and (2) the effect of conditional covariance on parameter estimates of an administrative case definition used to identify cases of diagnoses diabetes within the general population of Nova Scotia. We found significant differences in the sensitivity and specificity of a commonly used administrative case when applied to an Aboriginal population at the sub-provincial level. For the general population of Nova Scotia, we found that including a parameter to estimate conditional covariance between data sources resulted in significant variation in sensitivity, specificity, and prevalence estimates as compared to a study which did not consider this parameter. We conclude that work must continue to validate administrative case definitions both within minority populations and for the general population to enhance diabetes surveillance systems in Canada. / Validation study for administrative case definitions to identify cases of diagnosed diabetes in Canada
27

A Bayesian/MCMC Approach to Galaxy Modelling: NGC 6503

PUGLIELLI, DAVID 11 January 2010 (has links)
We use Bayesian statistics and Markov chain Monte Carlo (MCMC) techniques to construct dynamical models for the spiral galaxy NGC 6503. The constraints include surface brightness profiles which display a Freeman Type II structure; HI and ionized gas rotation curves; the stellar rotation, which is nearly coincident with the ionized gas curve; and the line of sight stellar dispersion, which displays a $\sigma-$drop at the centre. The galaxy models consist of a S\'rsic bulge, an exponential disc with an optional inner truncation and a cosmologically motivated dark halo. The Bayesian/MCMC technique yields the joint posterior probability distribution function for the input parameters, allowing constraints on model parameters such as the halo cusp strength, structural parameters for the disc and bulge, and mass-to-light ratios. We examine several interpretations of the data: the Type II surface brightness profile may be due to dust extinction, to an inner truncated disc or to a ring of bright stars; and we test separate fits to the gas and stellar rotation curves to determine if the gas traces the gravitational potential. We test each of these scenarios for bar stability, ruling out dust extinction. We also find that the gas cannot trace the gravitational potential, as the asymmetric drift is then too large to reproduce the stellar rotation. The disc is well fit by an inner-truncated profile, but the possibility of ring formation by a bar to reproduce the Type II profile is also a realistic model. We further find that the halo must have a cuspy profile with $\gamma \gtrsim 1$; the bulge has a lower $M/L$ than the disc, suggesting a star forming component in the centre of the galaxy; and the bulge, as expected for this late type galaxy, has a low S\'{e}rsic index with $n_b\sim1-2$, suggesting a formation history dominated by secular evolution. / Thesis (Ph.D, Physics, Engineering Physics and Astronomy) -- Queen's University, 2010-01-10 00:11:41.946
28

Development and Application of Statistical and Machine Learning Techniques in Probabilistic Astronomical Catalogue-Matching Problems

David Rohde Unknown Date (has links)
Advances in the development of detector and computer technology have led to a rapid increase in the availability of large datasets to the astronomical community. This has created opportunities to do science that would otherwise be difficult or impossible. At the same time, astronomers have acknowledged that this influx of data creates new challe nges in the development of tools and practice to facilitate usage of this technology by the international community. A world wide effort known as the Virtual Observatory has developed to this end involving collaborations between astronomers, computer scientists and statisticians. Different telescopes survey the sky in different wavelengths producing catalogues of objects con- taining observations of both positional and non-positional properties. Because multiple catalogues exist, a common situation is that there are two catalogues containing observations of the same piece of sky (e.g. one sparse catalogue with relatively few objects per unit area, and one dense catalogue with many more objects per unit area). Identifying matches i.e. different observations of the same object in different catalogues is an important step in building a multi-wavelength understanding of the universe. Positional properties of objects can be used in some cases to perform catalogue matching, however in other cases position alone is insufficient to determine matching objects. This thesis applies machine learning and statistical methods to explore the usefulness of non- positional properties in identifying these matching objects common in two different catalogues. A machine learning classification system is shown to be able to identify these objects in a particu- lar problem domain. It is shown that non-positional inputs can be very beneficial in identifying matches for a particular problem. The result is that supervised learning is shown to be a viable method to be applied in difficult catalogue matching problems. The use of probabilistic outputs is developed as an enhancement in order to give a means of iden- tifying the uncertainty in the matches. Something that distinguishes this problem from standard pattern classification problems is that one class, the match es, belong to a high dimensional dis- tribution where the non-matches belong to a lower dimensional distribution. This assumption is developed in a probabilistic framework. The result of this is a class of probability models useful for catalogue matching and a number of tests for the suitability of the computed probabilities. The tests were applied on a problem and showed a good classificati on rate, good results obtained by scoring rules and good calibration. Visual inspection of the output also suggested that algorithm was behaving in a sensible way. While reasonable results are obtained, it is acknowledged that the question of is the probability a good probability is philosophically awkward. One goal of analysing astronomical matched or unmatched catalogues is in order to make accurate inferential statements on the basis of the available data. A silent assumption is often made that the first step in analysing unmatched catalogues is to find the best match between them, then to plot this best-match data assuming it to be correct. This thesis shows that this assumption is false, inferential statements based on the best match data can potentially be quite misleading. To address this problem a new framework for catalogue matching, based on Bayesian statistics is developed. In this Bayesian framework it is unnecessary for the method to commit to a single matched dataset; rather the ensemble of all possible matches can be used. This method compares favourably to other methods either based upon choosing the most likely match. The result of this is the outline of a method for analysing astronomical datasets not by a scatter plot obtained from the perfectly known pre-matched list of data, but rather using predictive distributions which need not be based on a perfect list and indeed might be based upon unmatched or partly matched catalogues.
29

On Bayesian optimization and its application to hyperparameter tuning

Matosevic, Antonio January 2018 (has links)
This thesis introduces the concept of Bayesian optimization, primarly used in optimizing costly black-box functions. Besides theoretical treatment of the topic, the focus of the thesis is on two numerical experiments. Firstly, different types of acquisition functions, which are the key components responsible for the performance, are tested and compared. Special emphasis is on the analysis of a so-called exploration-exploitation trade-off. Secondly, one of the most recent applications of Bayesian optimization concerns hyperparameter tuning in machine learning algorithms, where the objective function is expensive to evaluate and not given analytically. However, some results indicate that much simpler methods can give similar results. Our contribution is therefore a statistical comparison of simple random search and Bayesian optimization in the context of finding the optimal set of hyperparameters in support vector regression. It has been found that there is no significant difference in performance of these two methods.
30

[en] LINEAR GROWTH BAYESIAN MODEL USING DISCOUNT FACTORS / [pt] MODELO BAYESIANO DE CRESCIMENTO LINEAR COM DESCONTOS

CRISTIANO AUGUSTO COELHO FERNANDES 17 November 2006 (has links)
[pt] O objetivo principal desta dissertação é descrever e discutir o Modelo Bayesiano de Crescimento Linear Sazonal, formulação Estados múltiplos, utilizando descontos. As idéias originais deste modelo foram desenvolvidas por Ameen e Harrison. Na primeira parte do trabalho (capítulos 2 e 3) apresentamos idéias bem gerais sobre Séries Temporais e os principais modelos da literatura. A segunda parte (capítulos 4, 5 e 6) é dedicada à Estatística Bayesiana (conceitos gerais), ao MDL na sua formulação original, e ao nosso modelo de interesse. São apresentadas algumas sugestões operacionais e um fluxograma de operação do modelo, com vistas a uma futura implementação computacional. / [en] The aim of this thesis is to discuss in details the Multiprocess Linear Grawth Bayesian Model for seasonal and/or nonseasonal series, using discount factors. The original formulation of this model was put forward recently by Ameen and Harrison. In the first part of the thesis (chapters 2 and 3) we show some general concepts related to time series and time series modelling, whereas in the second (chapters 4, 5 and 6) we formally presented / the Bayesian formulation of the proposed model. A flow chart and some optional parameter setings aiming a computational implementation is also presented.

Page generated in 0.0994 seconds