• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 2
  • 1
  • 1
  • Tagged with
  • 12
  • 12
  • 5
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A simulation study of confidence intervals for the transition matrix of a reversible Markov chain

Zhang, Xiaojing January 1900 (has links)
Master of Science / Department of Statistics / James W. Neill
2

Natechs and Climate Change:Wide-scale Spatial Modeling of the Occurrence Probability and Variability of Tropical Storm-Related Natech Events in the United States Under Various Climate Scenarios / Natech災害と気候変動:多様な気候シナリオの下での米国における熱帯低気圧を引き金としたNatech事象の発生率と変動性に関する広範囲の空間モデリング

Xiaolong, Luo 23 March 2021 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第23170号 / 工博第4814号 / 新制||工||1752(附属図書館) / 京都大学大学院工学研究科都市社会工学専攻 / (主査)教授 CRUZ Ana Maria , 教授 宇野 伸宏, 准教授 横松 宗太 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM
3

[en] FUZZY PROBABILITY ESTIMATION FROM IMPRECISE DATA / [pt] ESTIMAÇÃO DE PROBABILIDADE FUZZY A PARTIR DE DADOS IMPRECISOS

ALEXANDRE ROBERTO RENTERIA 20 April 2007 (has links)
[pt] Existem três tipos de incerteza: a de natureza aleatória, a gerada pelo conhecimento incompleto e a que ocorre em função do conhecimento vago ou impreciso. Há casos em que dois tipos de incerteza estão presentes, em especial nos experimentos aleatórios a partir de dados imprecisos. Para modelar a aleatoriedade quando a distribuição de probabilidade que rege o experimento não é conhecida, deve-se utilizar um método de estimação nãoparamétrico, tal como a janela de Parzen. Já a incerteza de medição, presente em qualquer medida de uma grandeza física, dá origem a dados imprecisos, tradicionalmente modelados por conceitos probabilísticos. Entretanto, como a probabilidade se aplica à análise de eventos aleatórios, mas não captura a imprecisão no evento, esta incerteza pode ser melhor representada por um número fuzzy segundo a transformação probabilidade-possibilidade superior. Neste trabalho é proposto um método de estimação não-paramétrico baseado em janela de Parzen para estimação da probabilidade fuzzy a partir de dados imprecisos. / [en] There are three kinds of uncertainty: one due to randomness, another due to incomplete knowledge and a third one due to vague or imprecise knowledge. Sometimes two kinds of uncertainty occur at the same time, especially in random experiments based on imprecise data. To model randomness when the probability distribution related to an experiment is unknown, a non-parametric estimation method must be used, such as the Parzen window. Uncertainty in measurement originates imprecise data, traditionally modelled through probability concepts. However, as probability applies to random events but does not capture their imprecision, this sort of uncertainty is better represented by a fuzzy number, through the superior probability-possibility transformation. This thesis proposes a non-parametric estimation method based on Parzen window to estimate fuzzy probability from imprecise data.
4

An Analysis Of Misclassification Rates For Decision Trees

Zhong, Mingyu 01 January 2007 (has links)
The decision tree is a well-known methodology for classification and regression. In this dissertation, we focus on the minimization of the misclassification rate for decision tree classifiers. We derive the necessary equations that provide the optimal tree prediction, the estimated risk of the tree's prediction, and the reliability of the tree's risk estimation. We carry out an extensive analysis of the application of Lidstone's law of succession for the estimation of the class probabilities. In contrast to existing research, we not only compute the expected values of the risks but also calculate the corresponding reliability of the risk (measured by standard deviations). We also provide an explicit expression of the k-norm estimation for the tree's misclassification rate that combines both the expected value and the reliability. Furthermore, our proposed and proven theorem on k-norm estimation suggests an efficient pruning algorithm that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly that compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5. Finally, our work provides a deeper understanding of decision trees.
5

The Generalized Splitting method for Combinatorial Counting and Static Rare-Event Probability Estimation

Zdravko Botev Unknown Date (has links)
This thesis is divided into two parts. In the first part we describe a new Monte Carlo algorithm for the consistent and unbiased estimation of multidimensional integrals and the efficient sampling from multidimensional densities. The algorithm is inspired by the classical splitting method and can be applied to general static simulation models. We provide examples from rare-event probability estimation, counting, optimization, and sampling, demonstrating that the proposed method can outperform existing Markov chain sampling methods in terms of convergence speed and accuracy. In the second part we present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability.
6

Cost-sensitive boosting : a unified approach

Nikolaou, Nikolaos January 2016 (has links)
In this thesis we provide a unifying framework for two decades of work in an area of Machine Learning known as cost-sensitive Boosting algorithms. This area is concerned with the fact that most real-world prediction problems are asymmetric, in the sense that different types of errors incur different costs. Adaptive Boosting (AdaBoost) is one of the most well-studied and utilised algorithms in the field of Machine Learning, with a rich theoretical depth as well as practical uptake across numerous industries. However, its inability to handle asymmetric tasks has been the subject of much criticism. As a result, numerous cost-sensitive modifications of the original algorithm have been proposed. Each of these has its own motivations, and its own claims to superiority. With a thorough analysis of the literature 1997-2016, we find 15 distinct cost-sensitive Boosting variants - discounting minor variations. We critique the literature using {\em four} powerful theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. From each framework, we derive a set of properties which must be obeyed by boosting algorithms. We find that only 3 of the published Adaboost variants are consistent with the rules of all the frameworks - and even they require their outputs to be calibrated to achieve this. Experiments on 18 datasets, across 21 degrees of cost asymmetry, all support the hypothesis - showing that once calibrated, the three variants perform equivalently, outperforming all others. Our final recommendation - based on theoretical soundness, simplicity, flexibility and performance - is to use the original Adaboost algorithm albeit with a shifted decision threshold and calibrated probability estimates. The conclusion is that novel cost-sensitive boosting algorithms are unnecessary if proper calibration is applied to the original.
7

Simulation of Strong Ground Motions in Mashiki Town, Kumamoto, Based on the Seismic Response Analysis of Soils and the Dynamic Rupture Modeling of Sources / 地盤応答解析および動力学的震源モデルに基づく熊本県益城町における強震動シミュレーション

Sun, Jikai 23 March 2021 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第23188号 / 工博第4832号 / 新制||工||1755(附属図書館) / 京都大学大学院工学研究科建築学専攻 / (主査)教授 松島 信一, 教授 竹脇 出, 教授 林 康裕 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM
8

Scalable Estimation and Testing for Complex, High-Dimensional Data

Lu, Ruijin 22 August 2019 (has links)
With modern high-throughput technologies, scientists can now collect high-dimensional data of various forms, including brain images, medical spectrum curves, engineering signals, etc. These data provide a rich source of information on disease development, cell evolvement, engineering systems, and many other scientific phenomena. To achieve a clearer understanding of the underlying mechanism, one needs a fast and reliable analytical approach to extract useful information from the wealth of data. The goal of this dissertation is to develop novel methods that enable scalable estimation, testing, and analysis of complex, high-dimensional data. It contains three parts: parameter estimation based on complex data, powerful testing of functional data, and the analysis of functional data supported on manifolds. The first part focuses on a family of parameter estimation problems in which the relationship between data and the underlying parameters cannot be explicitly specified using a likelihood function. We introduce a wavelet-based approximate Bayesian computation approach that is likelihood-free and computationally scalable. This approach will be applied to two applications: estimating mutation rates of a generalized birth-death process based on fluctuation experimental data and estimating the parameters of targets based on foliage echoes. The second part focuses on functional testing. We consider using multiple testing in basis-space via p-value guided compression. Our theoretical results demonstrate that, under regularity conditions, the Westfall-Young randomization test in basis space achieves strong control of family-wise error rate and asymptotic optimality. Furthermore, appropriate compression in basis space leads to improved power as compared to point-wise testing in data domain or basis-space testing without compression. The effectiveness of the proposed procedure is demonstrated through two applications: the detection of regions of spectral curves associated with pre-cancer using 1-dimensional fluorescence spectroscopy data and the detection of disease-related regions using 3-dimensional Alzheimer's Disease neuroimaging data. The third part focuses on analyzing data measured on the cortical surfaces of monkeys' brains during their early development, and subjects are measured on misaligned time markers. In this analysis, we examine the asymmetric patterns and increase/decrease trend in the monkeys' brains across time. / Doctor of Philosophy / With modern high-throughput technologies, scientists can now collect high-dimensional data of various forms, including brain images, medical spectrum curves, engineering signals, and biological measurements. These data provide a rich source of information on disease development, engineering systems, and many other scientific phenomena. The goal of this dissertation is to develop novel methods that enable scalable estimation, testing, and analysis of complex, high-dimensional data. It contains three parts: parameter estimation based on complex biological and engineering data, powerful testing of high-dimensional functional data, and the analysis of functional data supported on manifolds. The first part focuses on a family of parameter estimation problems in which the relationship between data and the underlying parameters cannot be explicitly specified using a likelihood function. We introduce a computation-based statistical approach that achieves efficient parameter estimation scalable to high-dimensional functional data. The second part focuses on developing a powerful testing method for functional data that can be used to detect important regions. We will show nice properties of our approach. The effectiveness of this testing approach will be demonstrated using two applications: the detection of regions of the spectrum that are related to pre-cancer using fluorescence spectroscopy data and the detection of disease-related regions using brain image data. The third part focuses on analyzing brain cortical thickness data, measured on the cortical surfaces of monkeys’ brains during early development. Subjects are measured on misaligned time-markers. By using functional data estimation and testing approach, we are able to: (1) identify asymmetric regions between their right and left brains across time, and (2) identify spatial regions on the cortical surface that reflect increase or decrease in cortical measurements over time.
9

Analyse de sensibilité fiabiliste avec prise en compte d'incertitudes sur le modèle probabiliste - Application aux systèmes aérospatiaux / Reliability-oriented sensitivity analysis under probabilistic model uncertainty – Application to aerospace systems

Chabridon, Vincent 26 November 2018 (has links)
Les systèmes aérospatiaux sont des systèmes complexes dont la fiabilité doit être garantie dès la phase de conception au regard des coûts liés aux dégâts gravissimes qu’engendrerait la moindre défaillance. En outre, la prise en compte des incertitudes influant sur le comportement (incertitudes dites « aléatoires » car liées à la variabilité naturelle de certains phénomènes) et la modélisation de ces systèmes (incertitudes dites « épistémiques » car liées au manque de connaissance et aux choix de modélisation) permet d’estimer la fiabilité de tels systèmes et demeure un enjeu crucial en ingénierie. Ainsi, la quantification des incertitudes et sa méthodologie associée consiste, dans un premier temps, à modéliser puis propager ces incertitudes à travers le modèle numérique considéré comme une « boîte-noire ». Dès lors, le but est d’estimer une quantité d’intérêt fiabiliste telle qu’une probabilité de défaillance. Pour les systèmes hautement fiables, la probabilité de défaillance recherchée est très faible, et peut être très coûteuse à estimer. D’autre part, une analyse de sensibilité de la quantité d’intérêt vis-à-vis des incertitudes en entrée peut être réalisée afin de mieux identifier et hiérarchiser l’influence des différentes sources d’incertitudes. Ainsi, la modélisation probabiliste des variables d’entrée (incertitude épistémique) peut jouer un rôle prépondérant dans la valeur de la probabilité obtenue. Une analyse plus profonde de l’impact de ce type d’incertitude doit être menée afin de donner une plus grande confiance dans la fiabilité estimée. Cette thèse traite de la prise en compte de la méconnaissance du modèle probabiliste des entrées stochastiques du modèle. Dans un cadre probabiliste, un « double niveau » d’incertitudes (aléatoires/épistémiques) doit être modélisé puis propagé à travers l’ensemble des étapes de la méthodologie de quantification des incertitudes. Dans cette thèse, le traitement des incertitudes est effectué dans un cadre bayésien où la méconnaissance sur les paramètres de distribution des variables d‘entrée est caractérisée par une densité a priori. Dans un premier temps, après propagation du double niveau d’incertitudes, la probabilité de défaillance prédictive est utilisée comme mesure de substitution à la probabilité de défaillance classique. Dans un deuxième temps, une analyse de sensibilité locale à base de score functions de cette probabilité de défaillance prédictive vis-à-vis des hyper-paramètres de loi de probabilité des variables d’entrée est proposée. Enfin, une analyse de sensibilité globale à base d’indices de Sobol appliqués à la variable binaire qu’est l’indicatrice de défaillance est réalisée. L’ensemble des méthodes proposées dans cette thèse est appliqué à un cas industriel de retombée d’un étage de lanceur. / Aerospace systems are complex engineering systems for which reliability has to be guaranteed at an early design phase, especially regarding the potential tremendous damage and costs that could be induced by any failure. Moreover, the management of various sources of uncertainties, either impacting the behavior of systems (“aleatory” uncertainty due to natural variability of physical phenomena) and/or their modeling and simulation (“epistemic” uncertainty due to lack of knowledge and modeling choices) is a cornerstone for reliability assessment of those systems. Thus, uncertainty quantification and its underlying methodology consists in several phases. Firstly, one needs to model and propagate uncertainties through the computer model which is considered as a “black-box”. Secondly, a relevant quantity of interest regarding the goal of the study, e.g., a failure probability here, has to be estimated. For highly-safe systems, the failure probability which is sought is very low and may be costly-to-estimate. Thirdly, a sensitivity analysis of the quantity of interest can be set up in order to better identify and rank the influential sources of uncertainties in input. Therefore, the probabilistic modeling of input variables (epistemic uncertainty) might strongly influence the value of the failure probability estimate obtained during the reliability analysis. A deeper investigation about the robustness of the probability estimate regarding such a type of uncertainty has to be conducted. This thesis addresses the problem of taking probabilistic modeling uncertainty of the stochastic inputs into account. Within the probabilistic framework, a “bi-level” input uncertainty has to be modeled and propagated all along the different steps of the uncertainty quantification methodology. In this thesis, the uncertainties are modeled within a Bayesian framework in which the lack of knowledge about the distribution parameters is characterized by the choice of a prior probability density function. During a first phase, after the propagation of the bi-level input uncertainty, the predictive failure probability is estimated and used as the current reliability measure instead of the standard failure probability. Then, during a second phase, a local reliability-oriented sensitivity analysis based on the use of score functions is achieved to study the impact of hyper-parameterization of the prior on the predictive failure probability estimate. Finally, in a last step, a global reliability-oriented sensitivity analysis based on Sobol indices on the indicator function adapted to the bi-level input uncertainty is proposed. All the proposed methodologies are tested and challenged on a representative industrial aerospace test-case simulating the fallout of an expendable space launcher.
10

Bayes Optimal Feature Selection for Supervised Learning

Saneem Ahmed, C G January 2014 (has links) (PDF)
The problem of feature selection is critical in several areas of machine learning and data analysis such as, for example, cancer classification using gene expression data, text categorization, etc. In this work, we consider feature selection for supervised learning problems, where one wishes to select a small set of features that facilitate learning a good prediction model in the reduced feature space. Our interest is primarily in filter methods that select features independently of the learning algorithm to be used and are generally faster to implement compared to other types of feature selection algorithms. Many common filter methods for feature selection make use of information-theoretic criteria such as those based on mutual information to guide their search process. However, even in simple binary classification problems, mutual information based methods do not always select the best set of features in terms of the Bayes error. In this thesis, we develop a general approach for selecting a set of features that directly aims to minimize the Bayes error in the reduced feature space with respect to the loss or performance measure of interest. We show that the mutual information based criterion is a special case of our setting when the loss function of interest is the logarithmic loss for class probability estimation. We give a greedy forward algorithm for approximately optimizing this criterion and demonstrate its application to several supervised learning problems including binary classification (with 0-1 error, cost-sensitive error, and F-measure), binary class probability estimation (with logarithmic loss), bipartite ranking (with pairwise disagreement loss), and multiclass classification (with multiclass 0-1 error). Our experiments suggest that the proposed approach is competitive with several state-of-the art methods.

Page generated in 0.1317 seconds