1 |
A Risk-Oriented Clustering Approach for Asset Categorization and Risk MeasurementLiu, Lu 18 July 2019 (has links)
When faced with market risk for investments and portfolios, people often calculate the
risk measure, which is a real number mapping to each random payoff. There are many ways to quantify the potential risk, among which the most important input is the features from future performance. Future distributions are unknown and thus always estimated from historical Profit and Loss (P&L) distributions. However, past data may not be appropriate for estimating the future; risk measures generated from single historical distributions can be subject to error. To overcome these shortcomings, one natural way implemented is to identify and categorize similar assets whose Profit and Loss distributions can be used as alternative scenarios.
In practice, one of the most common and intuitive categorizations is sector, based on industry. It is widely agreed that companies in the same sector share the same, or related, business types and operating characteristics. But in the field of risk management, sector-based categorization does not necessarily mean assets are grouped in terms of their risk profiles, and we show that risk measures in the same sector tend to have large variation. Although improved risk measures related to the distribution ambiguity has been discussed at length, we seek to develop a more risk-oriented categorization by providing a new clustering approach. Furthermore, our method can better inform us of the potential risk and the extreme worst-case scenario within the same category.
|
2 |
Robust Exact Algorithms for the Euclidean Bipartite Matching ProblemGattani, Akshaykumar Gopalkrishna 06 July 2023 (has links)
The minimum cost bipartite matching problem is a well-studied optimization problem in computer science and operations research, with wide-ranging applications in fields such as machine learning, economics, transportation, logistics and biology. A special instance of this problem is the computation of the p-Wasserstein distance which we define next. Given a complete bipartite graph with two disjoint sets of n points in d-dimensional Euclidean space and an integer p ≥ 1, let the cost of an edge be the p-th power of the Euclidean distance between its endpoints. The objective of this problem is to find a minimum-cost matching in this complete bipartite graph. The Hungarian algorithm is a classical method that solves this problem in O(n^3) time. There are many algorithms that have a run time better than that of the Hungarian algorithm if the graphs have non-negative integer edge costs bounded by C. Since the input points have real-valued coordinates and the Euclidean distances can be irrational, such algorithms only return an approximate matching. Thus, the Hungarian algorithm remains the fastest known algorithm to compute an exact matching. In this thesis, we implement a new algorithm in the divide and conquer framework that computes the exact p-Wasserstein distance and has a run time asymptotically better than the Hungarian algorithm for stochastic point sets. Inspired by the techniques used in the algorithm, we also design an alternate version of the Hungarian algorithm that uses a grid- based approach. Our experimental analysis shows that both of our algorithms significantly outperform the classical Hungarian algorithm. / Master of Science / Suppose we have two sets of equal number of items and a list of compatible pairs of items, where a pair is considered compatible if its items belong to different sets. A perfect matching is a subset of compatible pairs where each item is paired with exactly one other item. When trying to find a perfect matching, there may be multiple options, and minimizing the cost of the perfect matching is often desired. This is referred to as the minimum cost bipartite matching problem, which is extensively studied due to its importance in algorithmic theory and operations research. A special instance of this problem is the calculation of the p- Wasserstein distance. It has many practical applications in fields such as machine learning, economics, transportation, logistics and biology. The Hungarian algorithm is the only known algorithm that can compute the exact p-Wasserstein distance. Therefore, our focus is to develop exact algorithms for this problem that perform better than the Hungarian algorithm and can scale efficiently to large datasets.
|
3 |
On the calibration of Lévy driven time series with coupling distances : an application in paleoclimateGairing, Jan, Högele, Michael, Kosenkova, Tetiana, Kulik, Alexei January 2014 (has links)
This article aims at the statistical assessment of time series with large
fluctuations in short time, which are assumed to stem from a continuous process perturbed by a Lévy process exhibiting a heavy tail behavior. We propose an easily implementable procedure to estimate efficiently the statistical difference between the noisy behavior of the data and a given reference jump measure in terms of so-called coupling distances. After a
short introduction to Lévy processes and coupling distances we recall basic statistical approximation results and derive rates of convergence. In the sequel the procedure is elaborated in detail in an abstract setting and eventually applied in a case study to simulated and paleoclimate data. It indicates the dominant presence of a non-stable heavy-tailed jump Lévy component for some tail index greater than 2.
|
4 |
Wasserstein Distance on Finite Spaces: Statistical Inference and AlgorithmsSommerfeld, Max 18 October 2017 (has links)
No description available.
|
5 |
Transportation Techniques for Geometric ClusteringJanuary 2020 (has links)
abstract: This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing. / Dissertation/Thesis / Doctoral Dissertation Computer Engineering 2020
|
6 |
Empirical Optimal Transport on Discrete Spaces: Limit Theorems, Distributional Bounds and ApplicationsTameling, Carla 11 December 2018 (has links)
No description available.
|
7 |
Distances within and between Metric Spaces: Metric Geometry, Optimal Transport and Applications to Data AnalysisWan, Zhengchao January 2021 (has links)
No description available.
|
8 |
High dimensional Markov chain Monte Carlo methods : theory, methods and applications / Méthodes de Monte Carlo par chaîne de Markov en grandes dimensions : théorie, méthodes et applicationsDurmus, Alain 02 December 2016 (has links)
L'objet de cette thèse est l'analyse fine de méthodes de Monte Carlopar chaînes de Markov (MCMC) et la proposition de méthodologies nouvelles pour échantillonner une mesure de probabilité en grande dimension. Nos travaux s'articulent autour de trois grands sujets.Le premier thème que nous abordons est la convergence de chaînes de Markov en distance de Wasserstein. Nous établissons des bornes explicites de convergence géométrique et sous-géométrique. Nous appliquons ensuite ces résultats à l'étude d'algorithmes MCMC. Nous nous intéressons à une variante de l'algorithme de Metropolis-Langevin ajusté (MALA) pour lequel nous donnons des bornes explicites de convergence. Le deuxième algorithme MCMC que nous analysons est l'algorithme de Crank-Nicolson pré-conditionné, pour lequel nous montrerons une convergence sous-géométrique.Le second objet de cette thèse est l'étude de l'algorithme de Langevin unajusté (ULA). Nous nous intéressons tout d'abord à des bornes explicites en variation totale suivant différentes hypothèses sur le potentiel associé à la distribution cible. Notre étude traite le cas où le pas de discrétisation est maintenu constant mais aussi du cas d'une suite de pas tendant vers 0. Nous prêtons dans cette étude une attention toute particulière à la dépendance de l'algorithme en la dimension de l'espace d'état. Dans le cas où la densité est fortement convexe, nous établissons des bornes de convergence en distance de Wasserstein. Ces bornes nous permettent ensuite de déduire des bornes de convergence en variation totale qui sont plus précises que celles reportées précédemment sous des conditions plus faibles sur le potentiel. Le dernier sujet de cette thèse est l'étude des algorithmes de type Metropolis-Hastings par échelonnage optimal. Tout d'abord, nous étendons le résultat pionnier sur l'échelonnage optimal de l'algorithme de Metropolis à marche aléatoire aux densités cibles dérivables en moyenne Lp pour p ≥ 2. Ensuite, nous proposons de nouveaux algorithmes de type Metropolis-Hastings qui présentent un échelonnage optimal plus avantageux que celui de l'algorithme MALA. Enfin, nous analysons la stabilité et la convergence en variation totale de ces nouveaux algorithmes. / The subject of this thesis is the analysis of Markov Chain Monte Carlo (MCMC) methods and the development of new methodologies to sample from a high dimensional distribution. Our work is divided into three main topics. The first problem addressed in this manuscript is the convergence of Markov chains in Wasserstein distance. Geometric and sub-geometric convergence with explicit constants, are derived under appropriate conditions. These results are then applied to thestudy of MCMC algorithms. The first analyzed algorithm is an alternative scheme to the Metropolis Adjusted Langevin algorithm for which explicit geometric convergence bounds are established. The second method is the pre-Conditioned Crank-Nicolson algorithm. It is shown that under mild assumption, the Markov chain associated with thisalgorithm is sub-geometrically ergodic in an appropriated Wasserstein distance. The second topic of this thesis is the study of the Unadjusted Langevin algorithm (ULA). We are first interested in explicit convergence bounds in total variation under different kinds of assumption on the potential associated with the target distribution. In particular, we pay attention to the dependence of the algorithm on the dimension of the state space. The case of fixed step sizes as well as the case of nonincreasing sequences of step sizes are dealt with. When the target density is strongly log-concave, explicit bounds in Wasserstein distance are established. These results are then used to derived new bounds in the total variation distance which improve the one previously derived under weaker conditions on the target density.The last part tackles new optimal scaling results for Metropolis-Hastings type algorithms. First, we extend the pioneer result on the optimal scaling of the random walk Metropolis algorithm to target densities which are differentiable in Lp mean for p ≥ 2. Then, we derive new Metropolis-Hastings type algorithms which have a better optimal scaling compared the MALA algorithm. Finally, the stability and the convergence in total variation of these new algorithms are studied.
|
9 |
Bootstrap in high dimensional spacesBuzun, Nazar 28 January 2021 (has links)
Ziel dieser Arbeit ist theoretische Eigenschaften verschiedener Bootstrap Methoden zu untersuchen. Als Ergebnis führen wir die Konvergenzraten des Bootstrap-Verfahrens ein, die sich auf die Differenz zwischen der tatsächlichen Verteilung einer Statistik und der Resampling-Näherung beziehen.
In dieser Arbeit analysieren wir die Verteilung der l2-Norm der Summe unabhängiger Vektoren, des Summen Maximums in hoher Dimension, des Wasserstein-Abstands zwischen empirischen Messungen und Wassestein-Barycenters. Um die Bootstrap-Konvergenz zu beweisen, verwenden wir die Gaussche Approximations technik. Das bedeutet dass man in der betrachteten Statistik eine Summe unabhängiger Vektoren finden muss, so dass Bootstrap eine erneute Abtastung dieser Summe ergibt. Ferner kann diese Summe durch Gaussche Verteilung angenähert und mit der Neuabtastung Verteilung als Differenz zwischen Kovarianzmatrizen verglichen werden.
Im Allgemeinen scheint es sehr schwierig zu sein, eine solche Summe unabhängiger Vektoren aufzudecken, da einige Statistiken (zum Beispiel MLE) keine explizite Gleichung haben und möglicherweise unendlich dimensional sind. Um mit dieser Schwierigkeit fertig zu werden, verwenden wir einige neuartige Ergebnisse aus der statistischen Lerntheorie.
Darüber hinaus wenden wir Bootstrap bei Methoden zur Erkennung von Änderungspunkten an. Im parametrischen Fall analysieren wir den statischen Likelihood Ratio Test (LRT). Seine hohen Werte zeigen Änderungen der Parameter Verteilung in der Datensequenz an. Das Maximum von LRT hat eine unbekannte Verteilung und kann mit Bootstrap kalibriert werden. Wir zeigen die Konvergenzraten zur realen maximalen LRT-Verteilung. In nicht parametrischen Fällen verwenden wir anstelle von LRT den Wasserstein-Abstand zwischen empirischen Messungen. Wir testen die Genauigkeit von Methoden zur Erkennung von Änderungspunkten anhand von synthetischen Zeitreihen und Elektrokardiographiedaten. Letzteres zeigt einige Vorteile des nicht parametrischen Ansatzes gegenüber komplexen Modellen und LRT. / The objective of this thesis is to explore theoretical properties of various bootstrap methods. We introduce the convergence rates of the bootstrap procedure which corresponds to the difference between real distribution of some statistic and its resampling approximation.
In this work we analyze the distribution of Euclidean norm of independent vectors sum, maximum of sum in high dimension, Wasserstein distance between empirical measures, Wassestein barycenters. In order to prove bootstrap convergence we involve Gaussian approximation technique which means that one has to find a sum of independent vectors in the considered statistic such that bootstrap yields a resampling of this sum. Further this sum may be approximated by Gaussian distribution and compared with the resampling distribution as a difference between variance matrices.
In general it appears to be very difficult to reveal such a sum of independent vectors because some statistics (for example, MLE) don't have an explicit equation and may be infinite-dimensional. In order to handle this difficulty we involve some novel results from statistical learning theory, which provide a finite sample quadratic approximation of the Likelihood and suitable MLE representation. In the last chapter we consider the MLE of Wasserstein barycenters model. The regularised barycenters model has bounded derivatives and satisfies the necessary conditions of quadratic approximation.
Furthermore, we apply bootstrap in change point detection methods. In the parametric case we analyse the Likelihood Ratio Test (LRT) statistic. Its high values indicate changes of parametric distribution in the data sequence. The maximum of LRT has a complex distribution but its quantiles may be calibrated by means of bootstrap. We show the convergence rates of the bootstrap quantiles to the real quantiles of LRT distribution. In non-parametric case instead of LRT we use Wasserstein distance between empirical measures. We test the accuracy of change point detection methods on synthetic time series and electrocardiography (ECG) data. Experiments with ECG illustrate advantages of the non-parametric approach versus complex parametric models and LRT.
|
10 |
Semi-Supervised Semantic Segmentation for Agricultural Aerial ImagesChen-yi Lu (15383813) 01 May 2023 (has links)
<p>Unmanned Aerial Systems (UAS) have been an essential tool for field scouting, nutrient applications, and farm management. However, assessing the aerial images captured by UAS is labor-intensive, and human assessment can be misleading, introducing bias. Deep learning based image segmentation has been proposed to assist in segmenting different areas of interest in the field, but it usually requires significant pixel-level annotated data. To address this, we propose a semi-supervised learning algorithm, AgSemSeg, to train a robust image segmentation</p>
<p>model with less annotated data. Semi-supervised semantic segmentation aims to predict accurate pixel-level segmentation results via incorporating unlabeled images. Existing methods rely on computing the consistency loss on the output predictions between pseudo-labels and unlabeled images. In AgSemSeg, we exploit the intermediate feature representations rather than only using the output predictions to improve the overall performance of the</p>
<p>model. Specifically, we add a projection layer on the output of the backbone encoder, and inject consistency loss between intermediate feature representations with Sliced-Wasserstein distance. We evaluate AgSemSeg using Agriculture-Vision dataset and outperform the supervised baseline by up to 9.71%. We also evaluate AgSemSeg on benchmark datasets such as PASCAL VOC 2012 and Cityscapes datasets, and it outperforms supervised baselines by up to 24.6% and 7.5% mIoU, respectively. We also perform extensive ablation studies to show that our proposed components are key to the performance improvements of our method. </p>
|
Page generated in 0.341 seconds