Global ETD Search

11	Shluková analýza signálu EKG / ECG Cluster Analysis Pospíšil, David January 2013 (has links) This diploma thesis deals with the use of some methods of cluster analysis on the ECG signal in order to sort QRS complexes according to their morphology to normal and abnormal. It is used agglomerative hierarchical clustering and non-hierarchical method K – Means for which an application in Mathworks MATLAB programming equipment was developed. The first part deals with the theory of the ECG signal and cluster analysis, and then the second is the design, implementation and evaluation of the results of the usage of developed software on the ECG signal for the automatic division of QRS complexes into clusters.
12	Kdy kdo mluví? / Speaker Diarization Tomášek, Pavel January 2011 (has links) This work aims at a task of speaker diarization. The goal is to implement a system which is able to decide "who spoke when". Particular components of implementation are described. The main parts are feature extraction, voice activity detection, speaker segmentation and clustering and finally also postprocessing. This work also contains results of implemented system on test data including a description of evaluation. The test data comes from the NIST RT Evaluation 2005 - 2007 and the lowest error rate for this dataset is 18.52% DER. Results are compared with diarization system implemented by Marijn Huijbregts from The Netherlands, who worked on the same data in 2009 and reached 12.91% DER.
13	Hierarchical Clustering in Risk-Based Portfolio Construction / Hierarkisk klustring för riskbaserad portföljallokering Nanakorn, Natasha, Palmgren, Elin January 2021 (has links) Following the global financial crisis, both risk-based and heuristic portfolio construction methods have received much attention from both academics and practitioners since these methods do not rely on the estimation of expected returns and as such are assumed to be more stable than Markowitz's traditional mean-variance portfolio. In 2016, Lopéz de Prado presented the Hierarchical Risk Parity (HRP), a new approach to portfolio construction which combines hierarchical clustering of assets with a heuristic risk-based allocation strategy in order to increase stability and improve out-of-sample performance. Using Monte Carlo simulations, Lopéz de Prado was able to demonstrate promising results. This thesis attempts to evaluate HRP using walk-forward analysis and historical data from equity index and bond futures, against more realistic benchmark methods and using additional performance measures relevant to practitioners. The main conclusion is that applying hierarchical clustering to risk-based portfolio construction does indeed improve the out-of-sample return and Sharpe ratio. However, the resulting portfolio is also associated with a remarkably high turnover, which may indicate numerical instability and sensitivity to estimation errors. It is also identified that Lopéz de Prado's original HRP approach has an undesirable property and alternative approaches to HRP have consequently been developed. Compared to Lopéz de Prado's original HRP approach, these alternative approaches increase the Sharpe ratio with ~10% and reduce the turnover with 60-65%. However, it should be noted that compared to more mainstream portfolios the turnover is still rather high, indicating that these alternative approaches to HRP are still somewhat unstable and sensitive to estimation errors. / Efter den globala finanskrisen har intresset för riskbaserade och heuristiska metoder för portföljallokering ökat inom såväl akademin som finansindustrin. Det ökade intresset grundar sig i att dessa metoder inte kräver estimering av förväntad avkastning och därför kan antas vara mer stabila än portföljer med grund i Markowitz moderna portföljteori. Lopéz de Prado presenterade 2016 en ny metod för portföljallokering, Hierarchical Risk Parity (HRP), som kombinerar hierarkisk klustring med en heuristisk riskbaserad portföljkonstruktion och vars syfte är att öka stabiliteten och förbättra avkastningen. Baserat på Monte Carlo-simuleringar har Lopéz de Prado lyckats påvisa lovande resultat. Syftet med detta examensarbete är att utvärdera HRP med hjälp av walk-forward-analys och empirisk data från aktieindex- och obligationsterminer. I denna utvärdering jämförs HRP med andra vanliga portföljmetoder med avseende på prestandamått relevanta för portföljförvaltare. Den huvudsakliga slutsatsen är att tillämpning av hierarkisk klustring inom ramen för riskbaserad portföljallokering förbättrar såväl den absoluta avkastningen som Sharpekvoten. Däremot är det tydligt att vikterna i en HRP-portfölj har hög omsättning över tid, vilket kan tyda på numerisk instabilitet och hög känslighet för skattningsfel. Vidare har en oönskad egenskap i Lopéz de Prados ursprungliga HRP-metod identifierats, varför två alternativa HRP-metoder har utvecklats inom ramen för examensarbetet. Jämfört med Lopéz de Prados ursprungliga metod förbättrar de två alternativa metoderna Sharpekvoten med 10% och minskar omsättningen av portföljvikterna med 60-65%. Det bör dock understrykas att även de nya metoderna har en förhållandevis hög omsättning, vilket tyder på att numerisk instabilitet och hög känslighet för skattningsfel till viss del fortfarande kvarstår. Portfolio construction asset allocation risk-based asset allocation hierarchical clustering agglomerative clustering hierarchical risk parity risk volatility Portföljallokering portföljhantering portföljmetoder riskbaserad portföljallokering hierarkisk klustring agglomerativ klustring risk volatilitet Probability Theory and Statistics Sannolikhetsteori och statistik
14	Analysis of the human corneal shape with machine learning Bouazizi, Hala 01 1900 (has links) Cette thèse cherche à examiner les conditions optimales dans lesquelles les surfaces cornéennes antérieures peuvent être efficacement pré-traitées, classifiées et prédites en utilisant des techniques de modélisation géométriques (MG) et d’apprentissage automatiques (AU). La première étude (Chapitre 2) examine les conditions dans lesquelles la modélisation géométrique peut être utilisée pour réduire la dimensionnalité des données utilisées dans un projet d’apprentissage automatique. Quatre modèles géométriques ont été testés pour leur précision et leur rapidité de traitement : deux modèles polynomiaux (P) – polynômes de Zernike (PZ) et harmoniques sphériques (PHS) – et deux modèles de fonctions rationnelles (R) : fonctions rationnelles de Zernike (RZ) et fonctions rationnelles d’harmoniques sphériques (RSH). Il est connu que les modèles PHS et RZ sont plus précis que les modèles PZ pour un même nombre de coefficients (J), mais on ignore si les modèles PHS performent mieux que les modèles RZ, et si, de manière plus générale, les modèles SH sont plus précis que les modèles R, ou l’inverse. Et prenant en compte leur temps de traitement, est-ce que les modèles les plus précis demeurent les plus avantageux? Considérant des valeurs de J (nombre de coefficients du modèle) relativement basses pour respecter les contraintes de dimensionnalité propres aux taches d’apprentissage automatique, nous avons établi que les modèles HS (PHS et RHS) étaient tous deux plus précis que les modèles Z correspondants (PZ et RR), et que l’avantage de précision conféré par les modèles HS était plus important que celui octroyé par les modèles R. Par ailleurs, les courbes de temps de traitement en fonction de J démontrent qu’alors que les modèles P sont traités en temps quasi-linéaires, les modèles R le sont en temps polynomiaux. Ainsi, le modèle SHR est le plus précis, mais aussi le plus lent (un problème qui peut en partie être remédié en appliquant une procédure de pré-optimisation). Le modèle ZP était de loin le plus rapide, et il demeure une option intéressante pour le développement de projets. SHP constitue le meilleur compromis entre la précision et la rapidité. La classification des cornées selon des paramètres cliniques a une longue tradition, mais la visualisation des effets moyens de ces paramètres sur la forme de la cornée par des cartes topographiques est plus récente. Dans la seconde étude (Chapitre 3), nous avons construit un atlas de cartes d’élévations moyennes pour différentes variables cliniques qui pourrait s’avérer utile pour l’évaluation et l’interprétation des données d’entrée (bases de données) et de sortie (prédictions, clusters, etc.) dans des tâches d’apprentissage automatique, entre autres. Une base de données constituée de plusieurs milliers de surfaces cornéennes antérieures normales enregistrées sous forme de matrices d’élévation de 101 by 101 points a d’abord été traitée par modélisation géométrique pour réduire sa dimensionnalité à un nombre de coefficients optimal dans une optique d’apprentissage automatique. Les surfaces ainsi modélisées ont été regroupées en fonction de variables cliniques de forme, de réfraction et de démographie. Puis, pour chaque groupe de chaque variable clinique, une surface moyenne a été calculée et représentée sous forme de carte d’élévations faisant référence à sa SMA (sphère la mieux ajustée). Après avoir validé la conformité de la base de donnée avec la littérature par des tests statistiques (ANOVA), l’atlas a été vérifié cliniquement en examinant si les transformations de formes cornéennes présentées dans les cartes pour chaque variable étaient conformes à la littérature. C’était le cas. Les applications possibles d’un tel atlas sont discutées. La troisième étude (Chapitre 4) traite de la classification non-supervisée (clustering) de surfaces cornéennes antérieures normales. Le clustering cornéen un domaine récent en ophtalmologie. La plupart des études font appel aux techniques d’extraction des caractéristiques pour réduire la dimensionnalité de la base de données cornéennes. Le but est généralement d’automatiser le processus de diagnostique cornéen, en particulier en ce qui a trait à la distinction entre les cornées normales et les cornées irrégulières (kératocones, Fuch, etc.), et dans certains cas, de distinguer différentes sous-classes de cornées irrégulières. L’étude de clustering proposée ici se concentre plutôt sur les cornées normales afin de mettre en relief leurs regroupements naturels. Elle a recours à la modélisation géométrique pour réduire la dimensionnalité de la base de données, utilisant des polynômes de Zernike, connus pour leur interprétativité transparente (chaque terme polynomial est associé à une caractéristique cornéenne particulière) et leur bonne précision pour les cornées normales. Des méthodes de différents types ont été testées lors de prétests (méthodes de clustering dur (hard) ou souple (soft), linéaires or non-linéaires. Ces méthodes ont été testées sur des surfaces modélisées naturelles (non-normalisées) ou normalisées avec ou sans traitement d’extraction de traits, à l’aide de différents outils d’évaluation (scores de séparabilité et d’homogénéité, représentations par cluster des coefficients de modélisation et des surfaces modélisées, comparaisons statistiques des clusters sur différents paramètres cliniques). Les résultats obtenus par la meilleure méthode identifiée, k-means sans extraction de traits, montrent que les clusters produits à partir de surfaces cornéennes naturelles se distinguent essentiellement en fonction de la courbure de la cornée, alors que ceux produits à partir de surfaces normalisées se distinguent en fonction de l’axe cornéen. La dernière étude présentée dans cette thèse (Chapitre 5) explore différentes techniques d’apprentissage automatique pour prédire la forme de la cornée à partir de données cliniques. La base de données cornéennes a d’abord été traitée par modélisation géométrique (polynômes de Zernike) pour réduire sa dimensionnalité à de courts vecteurs de 12 à 20 coefficients, une fourchette de valeurs potentiellement optimales pour effectuer de bonnes prédictions selon des prétests. Différentes méthodes de régression non-linéaires, tirées de la bibliothèque scikit-learn, ont été testées, incluant gradient boosting, Gaussian process, kernel ridge, random forest, k-nearest neighbors, bagging, et multi-layer perceptron. Les prédicteurs proviennent des variables cliniques disponibles dans la base de données, incluant des variables géométriques (diamètre horizontal de la cornée, profondeur de la chambre cornéenne, côté de l’œil), des variables de réfraction (cylindre, sphère et axe) et des variables démographiques (âge, genre). Un test de régression a été effectué pour chaque modèle de régression, défini comme la sélection d’une des 256 combinaisons possibles de variables cliniques (les prédicteurs), d’une méthode de régression, et d’un vecteur de coefficients de Zernike d’une certaine taille (entre 12 et 20 coefficients, les cibles). Tous les modèles de régression testés ont été évalués à l’aide de score de RMSE établissant la distance entre les surfaces cornéennes prédites (les prédictions) et vraies (les topographies corn¬éennes brutes). Les meilleurs d’entre eux ont été validés sur l’ensemble de données randomisé 20 fois pour déterminer avec plus de précision lequel d’entre eux est le plus performant. Il s’agit de gradient boosting utilisant toutes les variables cliniques comme prédicteurs et 16 coefficients de Zernike comme cibles. Les prédictions de ce modèle ont été évaluées qualitativement à l’aide d’un atlas de cartes d’élévations moyennes élaborées à partir des variables cliniques ayant servi de prédicteurs, qui permet de visualiser les transformations moyennes d’en groupe à l’autre pour chaque variables. Cet atlas a permis d’établir que les cornées prédites moyennes sont remarquablement similaires aux vraies cornées moyennes pour toutes les variables cliniques à l’étude. / This thesis aims to investigate the best conditions in which the anterior corneal surface of normal corneas can be preprocessed, classified and predicted using geometric modeling (GM) and machine learning (ML) techniques. The focus is on the anterior corneal surface, which is the main responsible of the refractive power of the cornea. Dealing with preprocessing, the first study (Chapter 2) examines the conditions in which GM can best be applied to reduce the dimensionality of a dataset of corneal surfaces to be used in ML projects. Four types of geometric models of corneal shape were tested regarding their accuracy and processing time: two polynomial (P) models – Zernike polynomial (ZP) and spherical harmonic polynomial (SHP) models – and two corresponding rational function (R) models – Zernike rational function (ZR) and spherical harmonic rational function (SHR) models. SHP and ZR are both known to be more accurate than ZP as corneal shape models for the same number of coefficients, but which type of model is the most accurate between SHP and ZR? And is an SHR model, which is both an SH model and an R model, even more accurate? Also, does modeling accuracy comes at the cost of the processing time, an important issue for testing large datasets as required in ML projects? Focusing on low J values (number of model coefficients) to address these issues in consideration of dimensionality constraints that apply in ML tasks, it was found, based on a number of evaluation tools, that SH models were both more accurate than their Z counterparts, that R models were both more accurate than their P counterparts and that the SH advantage was more important than the R advantage. Processing time curves as a function of J showed that P models were processed in quasilinear time, R models in polynomial time, and that Z models were fastest than SH models. Therefore, while SHR was the most accurate geometric model, it was the slowest (a problem that can partly be remedied by applying a preoptimization procedure). ZP was the fastest model, and with normal corneas, it remains an interesting option for testing and development, especially for clustering tasks due to its transparent interpretability. The best compromise between accuracy and speed for ML preprocessing is SHP. The classification of corneal shapes with clinical parameters has a long tradition, but the visualization of their effects on the corneal shape with group maps (average elevation maps, standard deviation maps, average difference maps, etc.) is relatively recent. In the second study (Chapter 3), we constructed an atlas of average elevation maps for different clinical variables (including geometric, refraction and demographic variables) that can be instrumental in the evaluation of ML task inputs (datasets) and outputs (predictions, clusters, etc.). A large dataset of normal adult anterior corneal surface topographies recorded in the form of 101×101 elevation matrices was first preprocessed by geometric modeling to reduce the dimensionality of the dataset to a small number of Zernike coefficients found to be optimal for ML tasks. The modeled corneal surfaces of the dataset were then grouped in accordance with the clinical variables available in the dataset transformed into categorical variables. An average elevation map was constructed for each group of corneal surfaces of each clinical variable in their natural (non-normalized) state and in their normalized state by averaging their modeling coefficients to get an average surface and by representing this average surface in reference to the best-fit sphere in a topographic elevation map. To validate the atlas thus constructed in both its natural and normalized modalities, ANOVA tests were conducted for each clinical variable of the dataset to verify their statistical consistency with the literature before verifying whether the corneal shape transformations displayed in the maps were themselves visually consistent. This was the case. The possible uses of such an atlas are discussed. The third study (Chapter 4) is concerned with the use of a dataset of geometrically modeled corneal surfaces in an ML task of clustering. The unsupervised classification of corneal surfaces is recent in ophthalmology. Most of the few existing studies on corneal clustering resort to feature extraction (as opposed to geometric modeling) to achieve the dimensionality reduction of the dataset. The goal is usually to automate the process of corneal diagnosis, for instance by distinguishing irregular corneal surfaces (keratoconus, Fuch, etc.) from normal surfaces and, in some cases, by classifying irregular surfaces into subtypes. Complementary to these corneal clustering studies, the proposed study resorts mainly to geometric modeling to achieve dimensionality reduction and focuses on normal adult corneas in an attempt to identify their natural groupings, possibly in combination with feature extraction methods. Geometric modeling was based on Zernike polynomials, known for their interpretative transparency and sufficiently accurate for normal corneas. Different types of clustering methods were evaluated in pretests to identify the most effective at producing neatly delimitated clusters that are clearly interpretable. Their evaluation was based on clustering scores (to identify the best number of clusters), polar charts and scatter plots (to visualize the modeling coefficients involved in each cluster), average elevation maps and average profile cuts (to visualize the average corneal surface of each cluster), and statistical cluster comparisons on different clinical parameters (to validate the findings in reference to the clinical literature). K-means, applied to geometrically modeled surfaces without feature extraction, produced the best clusters, both for natural and normalized surfaces. While the clusters produced with natural corneal surfaces were based on the corneal curvature, those produced with normalized surfaces were based on the corneal axis. In each case, the best number of clusters was four. The importance of curvature and axis as grouping criteria in corneal data distribution is discussed. The fourth study presented in this thesis (Chapter 5) explores the ML paradigm to verify whether accurate predictions of normal corneal shapes can be made from clinical data, and how. The database of normal adult corneal surfaces was first preprocessed by geometric modeling to reduce its dimensionality into short vectors of 12 to 20 Zernike coefficients, found to be in the range of appropriate numbers to achieve optimal predictions. The nonlinear regression methods examined from the scikit-learn library were gradient boosting, Gaussian process, kernel ridge, random forest, k-nearest neighbors, bagging, and multilayer perceptron. The predictors were based on the clinical variables available in the database, including geometric variables (best-fit sphere radius, white-towhite diameter, anterior chamber depth, corneal side), refraction variables (sphere, cylinder, axis) and demographic variables (age, gender). Each possible combination of regression method, set of clinical variables (used as predictors) and number of Zernike coefficients (used as targets) defined a regression model in a prediction test. All the regression models were evaluated based on their mean RMSE score (establishing the distance between the predicted corneal surfaces and the raw topographic true surfaces). The best model identified was further qualitatively assessed based on an atlas of predicted and true average elevation maps by which the predicted surfaces could be visually compared to the true surfaces on each of the clinical variables used as predictors. It was found that the best regression model was gradient boosting using all available clinical variables as predictors and 16 Zernike coefficients as targets. The most explicative predictor was the best-fit sphere radius, followed by the side and refractive variables. The average elevation maps of the true anterior corneal surfaces and the predicted surfaces based on this model were remarkably similar for each clinical variable. cornea Zernike polynomials spherical harmonics rational functions corneal topography atlas average elevation map clustering gradient boosting Cornée Polynômes de Zernike Harmoniques sphériques Fonctions rationnelles Topographie cornéenne Cartes d’élévations moyennes Clustering agglomératif Clustering en k-moyennes Clustering spectral Régression Processus gaussien Kernel ridge Random forest K-nearest neighbors Bagging Agglomerative clustering K-means clustering Spectral clustering Gaussian process

Page generated in 0.143 seconds