51 |
Regularization for Sparseness and Smoothness : Applications in System Identification and Signal ProcessingOhlsson, Henrik January 2010 (has links)
In system identification, the Akaike Information Criterion (AIC) is a well known method to balance the model fit against model complexity. Regularization here acts as a price on model complexity. In statistics and machine learning, regularization has gained popularity due to modeling methods such as Support Vector Machines (SVM), ridge regression and lasso. But also when using a Bayesian approach to modeling, regularization often implicitly shows up and can be associated with the prior knowledge. Regularization has also had a great impact on many applications, and very much so in clinical imaging. In e.g., breast cancer imaging, the number of sensors is physically restricted which leads to long scantimes. Regularization and sparsity can be used to reduce that. In Magnetic Resonance Imaging (MRI), the number of scans is physically limited and to obtain high resolution images, regularization plays an important role. Regularization shows-up in a variety of different situations and is a well known technique to handle ill-posed problems and to control for overfit. We focus on the use of regularization to obtain sparseness and smoothness and discuss novel developments relevant to system identification and signal processing. In regularization for sparsity a quantity is forced to contain elements equal to zero, or to be sparse. The quantity could e.g., be the regression parameter vectorof a linear regression model and regularization would then result in a tool for variable selection. Sparsity has had a huge impact on neighboring disciplines, such as machine learning and signal processing, but rather limited effect on system identification. One of the major contributions of this thesis is therefore the new developments in system identification using sparsity. In particular, a novel method for the estimation of segmented ARX models using regularization for sparsity is presented. A technique for piecewise-affine system identification is also elaborated on as well as several novel applications in signal processing. Another property that regularization can be used to impose is smoothness. To require the relation between regressors and predictions to be a smooth function is a way to control for overfit. We are here particularly interested in regression problems with regressors constrained to limited regions in the regressor-space e.g., a manifold. For this type of systems we develop a new regression technique, Weight Determination by Manifold Regularization (WDMR). WDMR is inspired byapplications in biology and developments in manifold learning and uses regularization for smoothness to obtain smooth estimates. The use of regularization for smoothness in linear system identification is also discussed. The thesis also presents a real-time functional Magnetic Resonance Imaging (fMRI) bio-feedback setup. The setup has served as proof of concept and been the foundation for several real-time fMRI studies.
|
52 |
Adaptive L1 regularized second-order least squares method for model selectionXue, Lin 11 September 2015 (has links)
The second-order least squares (SLS) method in regression model proposed by Wang (2003, 2004) is based on the first two conditional moments of the response variable given the observed predictor variables. Wang and Leblanc (2008) show that the SLS estimator (SLSE) is asymptotically more efficient than the ordinary least squares estimator (OLSE) if the third moment of the random error is nonzero. We apply the SLS method to variable selection problems and propose the adaptively weighted L1 regularized SLSE (L1-SLSE). The L1-SLSE is robust against the shape of error distributions in variable selection problems. Finite sample simulation studies show that the L1-SLSE is more efficient than L1-OLSE in the case of asymmetric error distributions. A real data application with L1-SLSE is presented to demonstrate the usage of this method. / October 2015
|
53 |
Statistical Discovery of Biomarkers in MetagenomicsAbdul Wahab, Ahmad Hakeem January 2015 (has links)
Metagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field.
|
54 |
Reverse Engineering of Biological Systems2014 July 1900 (has links)
Gene regulatory network (GRN) consists of a set of genes and regulatory relationships between the genes. As outputs of the GRN, gene expression data contain important information that can be used to reconstruct the GRN to a certain degree. However, the reverse engineer of GRNs from gene expression data is a challenging problem in systems biology. Conventional methods fail in inferring GRNs from gene expression data because of the relative less number of observations compared with the large number of the genes. The inherent noises in the data make the inference accuracy relatively low and the combinatorial explosion nature of the problem makes the inference task extremely difficult. This study aims at reconstructing the GRNs from time-course gene expression data based on GRN models using system identification and parameter estimation methods. The main content consists of three parts: (1) a review of the methods for reverse engineering of GRNs, (2) reverse engineering of GRNs based on linear models and (3) reverse engineering of GRNs based on a nonlinear model, specifically S-systems.
In the first part, after the necessary background and challenges of the problem are introduced, various methods for the inference of GRNs are comprehensively reviewed from two aspects: models and inference algorithms. The advantages and disadvantages of each method are discussed.
The second part focus on inferring GRNs from time-course gene expression data based on linear models. First, the statistical properties of two sparse penalties, adaptive LASSO and SCAD, with an autoregressive model are studied. It shows that the proposed methods using these two penalties can asymptotically reconstruct the underlying networks. This provides a solid foundation for these methods and their extensions. Second, the integration of multiple datasets should be able to improve the accuracy of the GRN inference. A novel method, Huber group LASSO, is developed to infer GRNs from multiple time-course data, which is also robust to large noises and outliers that the data may contain. An efficient algorithm is also developed and its convergence analysis is provided.
The third part can be further divided into two phases: estimating the parameters of S-systems with system structure known and inferring the S-systems without knowing the system structure. Two methods, alternating weighted least squares (AWLS) and auxiliary function guided coordinate descent (AFGCD), have been developed to estimate the parameters of S-systems from time-course data. AWLS takes advantage of the special structure of S-systems and significantly outperforms one existing method, alternating regression (AR). AFGCD uses the auxiliary function and coordinate descent techniques to get the smart and efficient iteration formula and its convergence is theoretically guaranteed. Without knowing the system structure, taking advantage of the special structure of the S-system model, a novel method, pruning separable parameter estimation algorithm (PSPEA) is developed to locally infer the S-systems. PSPEA is then combined with continuous genetic algorithm (CGA) to form a hybrid algorithm which can globally reconstruct the S-systems.
|
55 |
Rôle des répétitions textuelles dans les Psaumes de la Pénitence de LASSUSLessoil-Daelman, Marcelle January 1993 (has links)
Textual repetitions abound in verses of the Seven Penitential Psalms of Lassus and this research attempts to discover their function. A total of one hundred and thirty-two verses were analyzed. The results of this investigation exhibit numerous mathematical figures underlying the entire work's structure, and the influence of repetitions is conspicuous in each figure's organization. Moreover, this study shows, in a smaller measure, the mutual influence between form and text expression. A detailed method of calculation is also provided which may eventually be applied to other works of the repertoire of the sixteenth century.
|
56 |
Computing a journal meta-ranking using paired comparisons and adaptive lasso estimatorsVana, Laura, Hochreiter, Ronald, Hornik, Kurt 01 1900 (has links) (PDF)
In a "publish-or-perish culture", the ranking of scientific journals plays a central role in assessing the performance in the current research environment. With a wide range of existing methods for deriving journal rankings, meta-rankings have gained popularity as a means of aggregating different information sources. In this paper, we propose a method to create a meta-ranking using heterogeneous journal rankings. Employing a parametric model for paired comparison data we estimate quality scores for 58 journals in the OR/MS/POM community, which together with a shrinkage procedure allows for the identification of clusters of journals with similar quality. The use of paired comparisons provides a flexible framework for deriving an aggregated score while eliminating the problem of missing data.
|
57 |
Penalized Regression Methods in the Study of Serum Biomarkers for Overweight and ObesityVasquez, Monica M., Vasquez, Monica M. January 2017 (has links)
The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. Although the availability of numerous serum biomarkers is highly promising, multiplex assays present statistical challenges due to the high dimensionality of these data. In this dissertation, three studies are presented that address these challenges using L1 penalized regression methods.
In the first part of the dissertation, an extensive simulation study is performed for the logistic regression model that compares the Least Absolute Shrinkage and Selection Operator (LASSO) method with five LASSO-type methods given scenarios that are present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. Results show that choice of optimal LASSO-type method is dependent on data structure and should be guided by the research objective. Methods are then applied to the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD) study for the identification of serum biomarkers of overweight and obesity.
Measurement of serum biomarkers using multiplex technologies may be more variable as compared to traditional single biomarker methods. Measurement error may induce bias in parameter estimation and complicate the variable selection process. In the second part of the dissertation, an existing measurement error correction method for penalized linear regression with L1 penalty has been adapted to accommodate validation data on a randomly selected subset of the study sample. A simulation study and analysis of TESAOD data demonstrate that the proposed approach improves variable selection and reduces bias in parameter estimation for validation data as small as 10 percent of the study sample. In the third part of the dissertation, a measurement error correction method that utilizes validation data is proposed for the penalized logistic regression model with the L1 penalty. A simulation study and analysis of TESAOD data are used to evaluate the proposed method. Results show an improvement in variable selection.
|
58 |
Statistical Modeling and Forecasting for Time Series With TrendAlraddadi, Rawiyah January 2021 (has links)
No description available.
|
59 |
Lasso for Autoregressive and Moving Average Coeffients via Residuals of Unobservable Time SeriesHanh , Nguyen T. January 2018 (has links)
No description available.
|
60 |
Bayesian Variable Selection for High-Dimensional Data with an Ordinal ResponseZhang, Yiran January 2019 (has links)
No description available.
|
Page generated in 0.0324 seconds