Global ETD Search

1	Statistical modelling of spatio-temporal dependencies in NGS data Ranciati, Saverio <1988> January 1900 (has links) Next-generation sequencing (NGS) has rapidly become the current standard in genetic related analysis. This switch from microarray to NGS required new statistical strategies to address the research questions inherent to the considered phenomena. First and foremost, NGS dataset usually consist of discrete observations characterized by overdispersion - that is, discrepancy between expected and observed variability - and an abundance of zeros, measured across a huge number of regions of the genome. With respect to chromatin immunoprecipitation sequencing (ChIP-Seq), a class of NGS data, it is of primary focus to discover the underlying (unobserved) pattern of `enrichment': more particularly, there is interest in the interactions between genes (or broader regions of the genome) and proteins, as they describe the mechanism of regulation under different conditions such as healthy or damaged tissue. Another interesting research question involves the clustering of these observations into groups that have practical relevance and interpretability, considering in particular that a single unit could potentially be allocated into more than one of these clusters, as it is reasonable to assume that its participation is not exclusive to one and only biological function and/or mechanism. Many of these complex processes, indeed, could also be described by sets of ordinary differential equations (ODE's), which are mathematical representations of the changes of a system through time, following a dynamic that is governed by some parameters we are interested in. In this thesis, we address the aforementioned tasks and research questions employing different statistical strategies, such as model-based clustering, graphical models, penalized smoothing and regression. We propose extensions of the existing approaches to better fit the problem at hand and we elaborate the methodology in a Bayesian environment, with the focus on incorporating the structural dependencies - both spatial and temporal - of the data at our disposal. SECS-S/01 Statistica
2	Statistical Inference in Open Quantum Systems Novelli, Marco <1985> January 1900 (has links) This thesis concerns the statistical analysis of open quantum systems subject to an external and non-stationary perturbation. In the first paper, a generalization of the explicit-duration hidden Markov models (EDHMM) which takes into account the presence of sparse data is presented. Introducing a kernel estimator in the estimation procedure increases the accuracy of the estimates, and thus allows one to obtain a more reliable information about the evolution of the unobservable system. A generalization of the Viterbi algorithm to EDHMM is developed. In the second paper, we develop a Markov Chain Monte Carlo (MCMC) procedure for estimating the EDHMM. We improve the flexibility of our formulation by adopting a Bayesian model selection procedure which allows one to avoid a direct specification of the number of states of the hidden chain. Motivated by the presence of sparsity, we make use of a non-parametric estimator to obtain more accurate estimates of the model parameters. The formulation presented turns out to be straightforward to implement, robust against the underflow problem and provides accurate estimates of the parameters. In the third paper, an extension of the Cramér-Rao inequality for quantum discrete parameter models is derived. The latter are models in which the parameter space is restricted to a finite set of points. In some estimation problems indeed, theory provides us with additional information that allow us to restrict the parameter space to a finite set of points. The extension presented sets the ultimate accuracy of an estimator, and determines a discrete counterpart of the quantum Fisher information. This is particularly useful in many experiments in which the parameters can assume only few different values: for example, the direction which the magnetic field points to. We also provide an illustration related to a quantum optics problem. SECS-S/01 Statistica
3	Statistical Analysis of a Close Von Karman Flow Pons, Flavio Maria Emanuele <1986> January 1900 (has links) This thesis addresses the statistical modeling of turbulence, focusing on three main aspects: the critical transition from laminarity to turbulence, the effects of the so-called intermittency and the energy dynamics of a turbulent flow. The central part of the thesis consists of six papers, divided into two parts. In Part I we develop two new indices to quantify the proximity to critical transitions in stochastic dynamical systems, with particular attention to the transition from laminarity to turbulence in fluids (Paper A). The two indices are tested on two toy models and then applied to the detection of critical events in a magnetised fluid and in financial time series. We define a third index Y, which quantifies the effects of intermittency and does not require very long time series. This index turns out to be effective in recovering the structure of the turbulent flow (Papers B, C). In Paper D we show that Y is also sensitive to the turbulent behavior of financial markets, providing a possible early warning indicator of the proximity to critical events. In Part II we introduce a new local observable as the arrival times of tracer particles at a particular point in the fluid as a proxy of the turbulent velocity field. We model the universal self-organising structure of this observable in an effective and parsimonious way. In the second paper of Part II, we model the continuous-time dynamics of the energy budget of the turbulent field. We show that this observable can be characterised as the exponential of a stochastic integral on a Lévy basis, under the assumption that the energy transmission across time scales is a multiplicative cascade process. SECS-S/01 Statistica
4	Large Covariance Matrix Estimation by Composite Minimization Farne', Matteo <1988> January 1900 (has links) The present thesis concerns large covariance matrix estimation via composite minimization under the assumption of low rank plus sparse structure. Existing methods like POET (Principal Orthogonal complEment Thresholding) perform estimation by extracting principal components and then applying a soft thresholding algorithm. In contrast, our method recovers the low rank plus sparse decomposition of the covariance matrix by least squares minimization under nuclear norm plus $l_1$ norm penalization. This non-smooth convex minimization procedure is based on semidefinite programming and subdifferential methods, resulting in two separable problems solved by a singular value thresholding plus soft thresholding algorithm. The most recent estimator in literature is called LOREC (Low Rank and sparsE Covariance estimator) and provides non-asymptotic error rates as well as identifiability conditions in the context of algebraic geometry. Our work shows that the unshrinkage of the estimated eigenvalues of the low rank component improves the performance of LOREC considerably. The same method also recovers covariance structures with very spiked latent eigenvalues like in the POET setting, thus overcoming the necessary condition $p\leq n$. In addition, it is proved that our method recovers structures with intermediate degrees of spikiness, obtaining a loss which is bounded accordingly. Then, an ad hoc model selection criterion which detects the optimal point in terms of composite penalty is proposed. Empirical results coming from a wide original simulation study where various low rank plus sparse settings are simulated according to different parameter values are described outlining in detail the improvements upon existing methods. Two real data-sets are finally explored highlighting the usefulness of our method in practical applications. SECS-S/01 Statistica
5	Item Response Theory models for the competence evaluation: towards a multidimensional approach in the University guidance Matteucci, Mariagiulia <1980> 26 March 2007 (has links) No description available. SECS-S/01 Statistica
6	Automated Local Linear Embedding with an application to microarray data Grilli, Elisa <1977> 26 March 2007 (has links) No description available. SECS-S/01 Statistica
7	Analisi spaziale della longevità in Emilia-Romagna Marino, Massimiliano <1973> 02 April 2008 (has links) Negli ultimi anni la longevità è divenuto un argomento di notevole interesse in diversi settori scientifici. Le ricerche volte ad indagare i meccanismi che regolano i fattori della longevità si sono moltiplicate nell’ultimo periodo interessando, in maniera diversa, alcune regioni del territorio italiano. Lo studio presentato nella tesi ha l’obiettivo di identificare eventuali aggregazioni territoriali caratterizzate da una significativa propensione alla longevità nella regione Emilia-Romagna mediante l’impiego di metodologie di clustering spaziale, alcune delle quali di recente implementazione. La popolazione in esame è costituita dagli individui residenti in Emilia- Romagna nel quinquennio 2000-2004 suddivisa in classi di età, sesso e comune. L’analisi è di tipo puramente spaziale, in cui l’unità geografica elementare è identificata dal comune, ed è stata condotta separatamente per i due sessi. L’identificazione delle aree regionali ad elevata longevità è avvenuta utilizzando quattro metodologie di clustering spaziale, basate sulla teoria della massima verosimiglianza, che si differenziano tra loro per la modalità di ricerca dei potenziali clusters. La differenza consiste nella capacità di identificare aggregazioni territoriali di forma regolare (spatial scan statistic, Kulldorff e Nagarwalla,1995; Kulldorff,1997, 1999) o dall’andamento geometrico “libero” (flexible scan statistic, Tango e Takahashi,2005; algoritmo genetico, Duczmal et al.,2007; greedy growth search, Yiannakoulias et al.,2007). Le caratteristiche di ciascuna metodologia consentono, in tal modo, di “catturare” le possibili conformazioni geografiche delle aggregazioni presenti sul territorio e la teoria statistica di base, comune ad esse, consente di effettuare agevolmente un confronto tra i risultati ottenuti. La persistenza di un’area caratterizzata da un’elevata propensione alla longevità consente, infatti, di ritenere il cluster identificato di notevole interesse per approfondimenti successivi. Il criterio utilizzato per la valutazione della persistenza di un cluster è stato derivato dalla teoria dei grafi, con particolare riferimento ai multigrafi. L’idea è confrontare, a parità di parametri di ricerca, i grafi associati alle aggregazioni spaziali identificate con le diverse metodologie attraverso una valutazione delle occorrenze dei collegamenti esistenti tra le coppie di vertici. Alcune valutazioni di carattere demografico ed un esame della letteratura esistente sugli studi di longevità, hanno indotto alla definizione di una classe (aperta) di età per rappresentare il fenomeno nella nostra ricerca: sono stati considerati gli individui con età superiore o uguale a 95 anni (indicata con 95+). La misura di sintesi utilizzata per descrivere il fenomeno è un indicatore specifico di longevità, mutuato dalla demografia, indicato con Centenarian Rate (CR) (Robine e Caselli, 2005). Esso è definito dal rapporto tra la popolazione 95+ e la popolazione residente, nello stesso comune, al censimento del 1961. L’idea alla base del CR è confrontare gli individui longevi di un istante temporale con quelli presenti, nella stessa area, circa 40 anni prima dell’osservazione, ipotizzando che l’effetto migratorio di una popolazione possa ritenersi trascurabile oltre i 60 anni di età. La propensione alla longevità coinvolge in maniera diversa le aree del territorio dell’Emilia-Romagna. Le province della regione caratterizzate da una maggiore longevità sono Bologna, Ravenna e parte di Forlì-Cesena mentre la provincia di Ferrara si distingue per un livello ridotto del fenomeno. La distinzione per sesso non appare netta: gli uomini con età 95+, numericamente inferiori alle donne, risiedono principalmente nei comuni delle province di Bologna e Ravenna, con qualche estensione nel territorio forlivese, analogamente a quanto accade per la popolazione femminile che mostra, tuttavia, una maggiore prevalenza nei territori di Bologna e Forlì-Cesena, includendo alcune aree del riminese. Le province occidentali della regione, invece, non risultano interessate significativamente da questo fenomeno. Le metodologie di cluster detection utilizzate nello studio hanno prodotto risultati pressoché simili seppur con criteri di ricerca differenti. La spatial scan statistic si conferma una metodologia efficace e veloce ma il vincolo geometrico regolare imposto al cluster condiziona il suo utilizzo, rivelando una scarsa adattabilità nell’identificazione di aggregazioni irregolari. La metodologia FSC ha evidenziato buone capacità di ricerca e velocità di esecuzione, completata da una descrizione chiara e dettagliata dei risultati e dalla possibilità di poter visualizzare graficamente i clusters finali, anche se con un livello minimo di dettaglio. Il limite principale della metodologia è la dimensione ridotta del cluster finale: l’eccessivo impegno computazionale richiesto dalla procedura induce a fissare il limite massimo al di sotto delle 30 aree, rendendola così utilizzabile solo nelle indagini in cui si ipotizza un’estensione limitata del fenomeno sul territorio. L’algoritmo genetico GA si rivela efficace nell’identificazione di clusters di qualsiasi forma ed estensione, seppur con una velocità di esecuzione inferiore rispetto alle procedure finora descritte. Senza un’adeguata selezione dei parametri di ricerca,la procedura può individuare clusters molto irregolari ed estesi, consigliando l’uso di penalizzazione non nulla in fase di ricerca. La scelta dei parametri di ricerca non è comunque agevole ed immediata e, spesso, è lasciata all’esperienza del ricercatore. Questo modo di procedere, in aggiunta alla mancanza di informazioni a priori sul fenomeno, aumenta il grado di soggettività introdotto nella selezione dei parametri influenzando i risultati finali. Infine, la metodologia GGS richiede un carico computazionale nettamente superiore rispetto a quello necessario per le altre metodologie utilizzate e l’introduzione di due parametri di controllo favorisce una maggiore arbitrarietà nella selezione dei valori di ricerca adeguati; inoltre, la recente implementazione della procedura e la mancanza di studi su dati reali inducono ad effettuare un numero maggiore di prove durante la fase di ricerca dei clusters. SECS-S/01 Statistica
8	Esperimenti per modelli parzialmente lineari con applicazione ai computer experiments Zagoraiou, Maroussa <1979> 02 April 2008 (has links) No description available. SECS-S/01 Statistica
9	Analyzing the dependence structure of microarray data: a copula–based approach Di Lascio, Francesca Marta Lilja <1979> 02 April 2008 (has links) The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix. SECS-S/01 Statistica
10	Modello multilevel a classi latenti: estensione al modello multidimensionale Del Giovane, Cinzia <1979> 02 April 2008 (has links) No description available. SECS-S/01 Statistica

Search results