1 |
Extensions of a Theory of Networks for Approximation and Learning: Dimensionality Reduction and ClusteringPoggio, Tomaso, Girosi, Federico 01 April 1990 (has links)
The theory developed in Poggio and Girosi (1989) shows the equivalence between regularization and a class of three-layer networks that we call regularization networks or Hyper Basis Functions. These networks are also closely related to the classical Radial Basis Functions used for interpolation tasks and to several pattern recognition and neural network algorithms. In this note, we extend the theory by defining a general form of these networks with two sets of modifiable parameters in addition to the coefficients $c_\\ alpha$: moving centers and adjustable norm- weight.
|
2 |
On the Dirichlet Prior and Bayesian RegularizationSteck, Harald, Jaakkola, Tommi S. 01 September 2002 (has links)
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a Dirichlet prior over the model parameters affects the learned model structure in a domain with discrete variables. Surprisingly, a weak prior in the sense of smaller equivalent sample size leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing strength of prior belief. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the prior strength balances a "trade-off" between regularizing the parameters or the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.
|
3 |
Bagging RegularizesPoggio, Tomaso, Rifkin, Ryan, Mukherjee, Sayan, Rakhlin, Alex 01 March 2002 (has links)
Intuitively, we expect that averaging --- or bagging --- different regressors with low correlation should smooth their behavior and be somewhat similar to regularization. In this note we make this intuition precise. Using an almost classical definition of stability, we prove that a certain form of averaging provides generalization bounds with a rate of convergence of the same order as Tikhonov regularization --- similar to fashionable RKHS-based learning algorithms.
|
4 |
Asymptotics and computations for approximation of method of regularization estimatorsLee, Sang-Joon 29 August 2005 (has links)
Inverse problems arise in many branches of natural science, medicine and engineering involving the recovery of a whole function given only a finite number of noisy measurements on functionals. Such problems are usually ill-posed, which causes severe difficulties for standard least-squares or maximum likelihood estimation techniques. These problems can be solved by a method of regularization. In this dissertation, we study various problems in the method of regularization. We develop asymptotic properties of the optimal smoothing parameters concerning levels of smoothing for estimating the mean function and an associated inverse function based on Fourier analysis. We present numerical algorithms for an approximated method of regularization estimator computation with linear inequality constraints. New data-driven smoothing parameter selection criteria are proposed in this setting. In addition, we derive a Bayesian credible interval for the approximated method of regularization estimators.
|
5 |
Asymptotics and computations for approximation of method of regularization estimatorsLee, Sang-Joon 29 August 2005 (has links)
Inverse problems arise in many branches of natural science, medicine and engineering involving the recovery of a whole function given only a finite number of noisy measurements on functionals. Such problems are usually ill-posed, which causes severe difficulties for standard least-squares or maximum likelihood estimation techniques. These problems can be solved by a method of regularization. In this dissertation, we study various problems in the method of regularization. We develop asymptotic properties of the optimal smoothing parameters concerning levels of smoothing for estimating the mean function and an associated inverse function based on Fourier analysis. We present numerical algorithms for an approximated method of regularization estimator computation with linear inequality constraints. New data-driven smoothing parameter selection criteria are proposed in this setting. In addition, we derive a Bayesian credible interval for the approximated method of regularization estimators.
|
6 |
Multiscale Spectral-Domain Parameterization for History Matching in Structured and Unstructured Grid GeometriesBhark, Eric Whittet 2011 August 1900 (has links)
Reservoir model calibration to production data, also known as history matching, is an essential tool for the prediction of fluid displacement patterns and related decisions concerning reservoir management and field development. The history matching of high resolution geologic models is, however, known to define an ill-posed inverse problem such that the solution of geologic heterogeneity is always non-unique and potentially unstable. A common approach to improving ill-posedness is to parameterize the estimable geologic model components, imposing a type of regularization that exploits geologic continuity by explicitly or implicitly grouping similar properties while retaining at least the minimum heterogeneity resolution required to reproduce the data. This dissertation develops novel methods of model parameterization within the class of techniques based on a linear transformation.
Three principal research contributions are made in this dissertation. First is the development of an adaptive multiscale history matching formulation in the frequency domain using the discrete cosine parameterization. Geologic model calibration is performed by its sequential refinement to a spatial scale sufficient to match the data. The approach enables improvement in solution non-uniqueness and stability, and further balances model and data resolution as determined by a parameter identifiability metric. Second, a model-independent parameterization based on grid connectivity information is developed as a generalization of the cosine parameterization for applicability to generic grid geometries. The parameterization relates the spatial reservoir parameters to the modal shapes or harmonics of the grid on which they are defined, merging with a Fourier analysis in special cases (i.e., for rectangular grid cells of constant dimensions), and enabling a multiscale calibration of the reservoir model in the spectral domain. Third, a model-dependent parameterization is developed to combine grid connectivity with prior geologic information within a spectral domain representation. The resulting parameterization is capable of reducing geologic models while imposing prior heterogeneity on the calibrated model using the adaptive multiscale workflow.
In addition to methodological developments of the parameterization methods, an important consideration in this dissertation is their applicability to field scale reservoir models with varying levels of prior geologic complexity on par with current industry standards.
|
7 |
Asymptotics of Gaussian Regularized Least-SquaresLippert, Ross, Rifkin, Ryan 20 October 2005 (has links)
We consider regularized least-squares (RLS) with a Gaussian kernel. Weprove that if we let the Gaussian bandwidth $\sigma \rightarrow\infty$ while letting the regularization parameter $\lambda\rightarrow 0$, the RLS solution tends to a polynomial whose order iscontrolled by the relative rates of decay of $\frac{1}{\sigma^2}$ and$\lambda$: if $\lambda = \sigma^{-(2k+1)}$, then, as $\sigma \rightarrow\infty$, the RLS solution tends to the $k$th order polynomial withminimal empirical error. We illustrate the result with an example.
|
8 |
On Mixup Training of Neural NetworksLiu, Zixuan 14 December 2022 (has links)
Deep neural networks are powerful tools of machine learning. Despite their capabilities of fitting the training data, they tend to perform undesirably on the unseen data. To improve the generalization of the deep neural networks, a variety of regularization techniques have been proposed. This thesis studies a simple yet effective regularization scheme, Mixup, which has been proposed recently. Briefly speaking, Mixup creates synthetic examples by linearly interpolating random pairs of the real examples and uses the synthetic examples for training. Although Mixup has been empirically shown to be effective on various classification tasks for neural network models, its working mechanism and possible limitations have not been well understood.
One potential problem of Mixup is known as manifold intrusion, in which the synthetic examples "intrude" the data manifolds of the real data, resulting in the conflicts between the synthetic labels and the ground-truth labels of the synthetic examples. The first part of this thesis investigates the strategies for resolving the manifold intrusion problem. We focus on two strategies. The first strategy, which we call "relabelling", attempts to find better labels for the synthetic data; the second strategy, which we call "cautious mixing", carefully selects the interpolating parameters to generate the synthetic examples. Through extensive experiments over several design choices, we observe that the "cautious mixing" strategy appears to perform better.
The second part of this thesis reports a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of the Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of the original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthetic data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, the model becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.
|
9 |
Regularization Theory and Shape ConstraintsVerri, Alessandro, Poggio, Tomaso 01 September 1986 (has links)
Many problems of early vision are ill-posed; to recover unique stable solutions regularization techniques can be used. These techniques lead to meaningful results, provided that solutions belong to suitable compact sets. Often some additional constraints on the shape or the behavior of the possible solutions are available. This note discusses which of these constraints can be embedded in the classic theory of regularization and how, in order to improve the quality of the recovered solution. Connections with mathematical programming techniques are also discussed. As a conclusion, regularization of early vision problems may be improved by the use of some constraints on the shape of the solution (such as monotonicity and upper and lower bounds), when available.
|
10 |
On Edge DetectionTorre, V., Poggio, T. 01 August 1984 (has links)
Edge detection is the process that attempts to characterize the intensity changes in the image in terms of the physical processes that have originated them. A critical, intermediate goal of edge detection is the detection and characterization of significant intensity changes. This paper discusses this part fo the edge detection problem. To characterize the types of intensity changes derivatives of different types, and possibly different scales, are needed. Thus we consider this part of edge detection as a problem in numerical differentiation. We show that numerical differentiation of images is an ill-posed problem in the sense of Hadamard. Differentiation needs to be regularized by a regularizing filtering operation before differentiation. This shows that his part of edge detection consists of two steps, a filtering step and differentiation step.
|
Page generated in 0.1099 seconds