21 |
An evolutionary approach to the design of experiments for combinatorial optimization with an application to enzyme engineeringBorrotti, Matteo <1981> 18 March 2011 (has links)
In a large number of problems the high dimensionality of the search space, the vast number of variables and the economical constrains limit the ability of classical techniques to reach the optimum of a function, known or unknown. In this thesis we investigate the possibility to combine approaches from advanced statistics and optimization algorithms in such a way to better explore the combinatorial search space and to increase the performance of the approaches. To this purpose we propose two methods: (i) Model Based Ant Colony Design and (ii) Naïve Bayes Ant Colony Optimization. We test the performance of the two proposed solutions on a simulation study and we apply the novel techniques on an appplication in the field of Enzyme Engineering and Design.
|
22 |
A Multilevel Model with Time Series Components for the Analysis of Tribal Art PricesModugno, Lucia <1983> 03 February 2012 (has links)
In the present work we perform an econometric analysis of the Tribal art market. To this aim, we use a unique and original database that includes information on Tribal art market auctions worldwide from 1998 to 2011. In Literature, art prices are modelled through the hedonic regression model, a classic fixed-effect model.
The main drawback of the hedonic approach is the large number of parameters, since, in general, art data include many categorical variables. In this work, we propose a multilevel
model for the analysis of Tribal art prices that takes into account the influence of time on artwork prices. In fact, it is natural to assume that time exerts an influence over the price dynamics in various ways. Nevertheless, since the set of
objects change at every auction date, we do not have repeated measurements
of the same items over time. Hence, the dataset does not constitute a proper
panel; rather, it has a two-level
structure in that items, level-1 units, are grouped in time points, level-2
units. The main theoretical contribution is the extension of classical
multilevel models to cope with the case described above. In particular,
we introduce a model with time dependent random effects at the second
level. We propose a novel specification of the model, derive the maximum likelihood
estimators and implement them through the E-M algorithm. We test the finite sample properties of the estimators and the validity of the own-written R-code by means of a simulation study. Finally, we show that the new model improves considerably the fit of the Tribal art data with respect to both the hedonic regression model and the classic multilevel model.
|
23 |
Employing Distances in Design-based Spatial EstimationVagheggini, Alessandro <1984> 18 February 2013 (has links)
In the last couple of decades we assisted to a reappraisal of spatial design-based techniques. Usually the spatial information regarding the spatial location of the individuals of a population has been used to develop efficient sampling designs.
This thesis aims at offering a new technique for both inference on individual values and global population values able to employ the spatial information available before sampling at estimation level by rewriting a deterministic interpolator under a design-based framework. The achieved point estimator of the individual values is treated both in the case of finite spatial populations and continuous spatial domains, while the theory on the estimator of the population global value covers the finite population case only.
A fairly broad simulation study compares the results of the point estimator with the simple random sampling without replacement estimator in predictive form and the kriging, which is the benchmark technique for inference on spatial data. The Monte Carlo experiment is carried out on populations generated according to different superpopulation methods in order to manage different aspects of the spatial structure. The simulation outcomes point out that the proposed point estimator has almost the same behaviour as the kriging predictor regardless of the parameters adopted for generating the populations, especially for low sampling fractions. Moreover, the use of the spatial information improves substantially design-based spatial inference on individual values.
|
24 |
Local Trigonometric Methods for Time Series Smoothing.Gentile, Maria <1981> 15 May 2014 (has links)
The thesis is concerned with local trigonometric regression methods. The aim was to develop a method for extraction of cyclical components in time series. The main results of the thesis are the following.
First, a generalization of the filter proposed by Christiano and Fitzgerald is furnished for the smoothing of ARIMA(p,d,q) process.
Second, a local trigonometric filter is built, with its statistical properties.
Third, they are discussed the convergence properties of trigonometric estimators, and the problem of choosing the order of the model.
A large scale simulation experiment has been designed in order to assess the performance of the proposed models and methods. The results show that local trigonometric regression may be a useful tool for periodic time series analysis.
|
25 |
Multidimensional item response theory models with general and specific latent traits for ordinal dataMartelli, Irene <1984> 15 May 2014 (has links)
The aim of the thesis is to propose a Bayesian estimation through Markov chain Monte Carlo of multidimensional item response theory models for graded responses with complex structures and correlated traits. In particular, this work focuses on the multiunidimensional and the additive underlying latent structures, considering that the first one is widely used and represents a classical approach in multidimensional item response analysis, while the second one is able to reflect the complexity of real interactions between items and respondents.
A simulation study is conducted to evaluate the parameter recovery for the proposed models under different conditions (sample size, test and subtest length, number of response categories, and correlation structure).
The results show that the parameter recovery is particularly sensitive to the sample size, due to the model complexity and the high number of parameters to be estimated. For a sufficiently large sample size the parameters of the multiunidimensional and additive graded response models are well reproduced. The results are also affected by the trade-off between the number of items constituting the test and the number of item categories.
An application of the proposed models on response data collected to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry is also presented.
|
26 |
Comparing Different Approaches for Clustering Categorical DataAnderlucci, Laura <1984> 03 February 2012 (has links)
There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints.
Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group.
In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution.
But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other?
In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation
outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.
|
27 |
Bayesian space-time data fusion for real-time forecasting and map uncertaintyPaci, Lucia <1985> 17 January 2014 (has links)
Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion.
In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next
three hours. We propose a Bayesian downscaler
model based on first differences with a flexible
coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions.
Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.
|
28 |
Permutation-based stochastic ordering using pairwise comparisons / Ordinamento stocastico basato sulle permutazioni utilizzando confronti a coppieMattiello, Federico <1985> 02 February 2015 (has links)
The topic of this work concerns nonparametric permutation-based methods aiming to find a ranking (stochastic ordering) of a given set of groups (populations), gathering together information from multiple variables under more than one experimental designs.
The problem of ranking populations arises in several fields of science from the need of comparing G>2 given groups or treatments when the main goal is to find an order while taking into account several aspects. As it can be imagined, this problem is not only of theoretical interest but it
also has a recognised relevance in several fields, such as industrial experiments or behavioural sciences, and this is reflected by the vast literature on the topic, although sometimes the problem is associated with different keywords such as: "stochastic ordering", "ranking", "construction
of composite indices" etc., or even "ranking probabilities" outside of the strictly-speaking statistical literature.
The properties of the proposed method are empirically evaluated by means of an extensive simulation study, where several aspects of interest are let to vary within a reasonable practical range. These aspects comprise: sample size, number of variables, number of groups, and distribution of noise/error.
The flexibility of the approach lies mainly in the several available choices for the test-statistic and in the different types of experimental design that
can be analysed. This render the method able to be tailored to the specific problem and the to nature of the data at hand.
To perform the analyses an R package called SOUP (Stochastic Ordering Using Permutations) has been written and it is available on CRAN.
|
29 |
A Bayesian changepoint analysis on spatio-temporal point processesAltieri, Linda <1986> 02 February 2015 (has links)
Changepoint analysis is a well established area of statistical research, but in the context of spatio-temporal point processes it is as yet relatively unexplored. Some substantial differences with regard to standard changepoint analysis have to be taken into account: firstly, at every time point the datum is an irregular pattern of points; secondly, in real situations issues of spatial dependence between points and temporal dependence within time segments raise. Our motivating example consists of data concerning the monitoring and recovery of radioactive particles from Sandside beach, North of Scotland; there have been two major changes in the equipment used to detect the particles, representing known potential changepoints in the number of retrieved particles. In addition, offshore particle retrieval campaigns are believed may reduce the particle intensity onshore with an unknown temporal lag; in this latter case, the problem concerns multiple unknown changepoints. We therefore propose a Bayesian approach for detecting multiple changepoints in the intensity function of a spatio-temporal point process, allowing for spatial and temporal dependence within segments. We use Log-Gaussian Cox Processes, a very flexible class of models suitable for environmental applications that can be implemented using integrated nested Laplace approximation (INLA), a computationally efficient alternative to Monte Carlo Markov Chain methods for approximating the posterior distribution of the parameters. Once the posterior curve is obtained, we propose a few methods for detecting significant change points. We present a simulation study, which consists in generating spatio-temporal point pattern series under several scenarios; the performance of the methods is assessed in terms of type I and II errors, detected changepoint locations and accuracy of the segment intensity estimates. We finally apply the above methods to the motivating dataset and find good and sensible results about the presence and quality of changes in the process.
|
30 |
Differential expression analysis for sequence count data via mixtures of negative binomialsBonafede, Elisabetta <1987> 02 February 2015 (has links)
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements.
One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on).
As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors.
We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the
sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
|
Page generated in 0.0354 seconds