• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 121
  • 21
  • 20
  • 11
  • 7
  • 6
  • 3
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 233
  • 76
  • 53
  • 46
  • 44
  • 38
  • 36
  • 31
  • 30
  • 30
  • 27
  • 25
  • 23
  • 20
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Structural change, convergence and networks: theoretical and empirical analyses

Dutta, Aparna 08 April 2016 (has links)
The dissertation consists of three chapters that study topics related to structural change, spatial and network data. The first chapter considers the problem of testing for a structural change in the spatial lag parameter in a panel model. We propose a likelihood ratio test of the null hypothesis of no change against the alternative hypothesis of a single change. The limiting distribution of the test is derived under the null hypothesis when the number of time periods is large. We also propose a break-date estimator to determine the location of the change point following evidence against the null hypothesis. We present Monte Carlo evidence to show that the proposed procedure performs well in finite samples. We use US state budget data to investigate changes in budget spillovers and the interdependence of fiscal policy within US states. The second chapter proposes a theory of cross-country migration in the form of labor mobility based on regional and sectoral productivity shocks in a multi-country, multi-sector setting. The productivity model when applied to US state data explains both the nominal and relative flow of workers across the U.S. well, which is taken as the frictionless benchmark. On the other hand, when applied to Europe the model explains the relative flow network well, but predicts a higher nominal flow. This missing mass of migrants is explained by socio-cultural-political barriers. We use dyadic regressions to assess the effects of institutional and cultural "distance" between countries in explaining the "European immobility puzzle". The third chapter shows that the "iron-law" of convergence (2\%) still holds for the world. We document a structural break in Africa's convergence rate and argue that Africa was not converging before 2000. The world convergence rate before 2000 was driven by Asian and Latin American countries. We show that recent institutional and infrastructural developments have led the African countries on the path of "catching up". We use Least-Absolute-Shrinkage-and-Selection-Operator (LASSO) to select the variables and a double selection method to estimate the treatment effect in a partially linear model. We compare LASSO variable selections with those obtained using Gradient-Boosting-Method (GBM) and Random Forest.
92

Uma nova aplicação para o método Lasso : index tracking no mercado brasileiro

Monteiro, Lucas da Silva January 2017 (has links)
Diante das evidências registradas na literatura de que, de forma geral, os fundos ativos não têm sido bem-sucedidos na tarefa de bater seus benchmarks, os fundos passivos – que buscam reproduzir as características de risco e retorno de um índice de mercado definido – vem ganhando espaço como alternativa de investimento na carteira dos investidores. A estratégia de reproduzir um índice é chamada de index tracking. Nesse sentido, o objetivo deste trabalho consiste em introduzir a técnica LASSO como método endógeno de seleção e otimização de ativos para a execução de um index tracking no mercado brasileiro e compara-lo com a execução de um index tracking pela técnica de seleção por participação dos ativos no índice de referência (otimizado por cointegração). A utilização da técnica LASSO, tal como proposta, constitui uma novidade na aplicação para o mercado financeiro brasileiro. Os testes comparativos foram realizados com as ações do índice Ibovespa entre os anos de 2010 e 2016. Sabendo das limitações relativas ao período de análise, os resultados sugerem, entre outros pontos, que o método LASSO gera tracking errors mais voláteis do que o método ad hoc tradicional e, dessa forma, gera menor aderência da carteira de réplica ao benchmark ao longo do tempo. / Given the evidence in the literature that, in general, the active funds have not been successful in the task of hitting their benchmarks, the passive funds - which seek to reproduce the risk and return characteristics of a defined market index - come gaining space as an investment alternative in the investor portfolio. The strategy of reproducing an index is called index tracking. In this sense, the objective of this study is to introduce the LASSO technique as an endogenous method of selection and optimization of assets for the execution of an index tracking in the Brazilian market and to compare it with the performance of an index tracking by the technique of selection by participation in benchmark index (optimized by cointegration). The LASSO technique, as proposed, is innovative as application to the Brazilian financial market. The comparative tests were carried out with the stocks of the Ibovespa index between 2010 and 2016. Regarding the limitations related to the analysis period, the results suggest, among others, that the LASSO method generates more volatile tracking than the traditional ad hoc proceding, and thus, generates a portfolio that is less adhered to the benchmark over time.
93

Ranked sparsity: a regularization framework for selecting features in the presence of prior informational asymmetry

Peterson, Ryan Andrew 01 May 2019 (has links)
In this dissertation, we explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in the presence of derived variables. Ranked sparsity arises in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional model selection methods to fail because statisticians commonly presume that each potential parameter is equally worthy of entering into the final model - we call this principle "covariate equipoise". However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the presumption of covariate equipoise will often produce misclassified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g. interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented either with a modified Bayesian information criterion (RBIC) or with the sparsity-ranked lasso (SRL). In chapter 1, we provide a philosophical motivation for ranked sparsity by describing situations where traditional model selection methods fail. Chapter 1 also presents some of the relevant literature, and motivates why ranked sparsity methods are necessary in the context of interactions. Finally, we introduce RBIC and SRL as possible recourses. In chapter 2, we explore the performance of SRL relative to competing methods for selecting polynomials and interactions in a series of simulations. We show that the SRL is a very attractive method because it is fast, accurate, and does not tend to inflate the number of Type I errors in the interactions. We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene-environment interactions, which are very difficult to find in practice. In chapter 3, we present three extensions of the SRL in very different contexts. First, we show how the method can be used to optimize for cost and prediction accuracy simulataneously when covariates have differing collection costs. In this setting, the SRL produces what we call "minimally invasive" models, i.e. models that can easily (and cheaply) be applied to new data. Second, we investigate the use of the SRL in the context of time series regression, where we evaluate our method against several other state-of-the-art techniques in predicting the hourly number of arrivals at the Emergency Department of the University of Iowa Hospitals and Clinics. Finally, we show how the SRL can be utilized to balance model stability and model adaptivity in an application which uses a rich new source of smartphone thermometer data to predict flu incidence in real time.
94

Passive detection of radionuclides from weak and poorly resolved gamma-ray energy spectra

Kump, Paul 01 July 2012 (has links)
Large passive detectors used in screening for special nuclear materials at ports of entry are characterized by poor spectral resolution, making identification of radionuclides a difficult task. Most identification routines, which fit empirical shapes and use derivatives, are impractical in these situations. Here I develop new, physics-based methods to determine the presence of spectral signatures of one or more of a set of isotopes. Gamma-ray counts are modeled as Poisson processes, where the average part is taken to be the model and the difference between the observed gamma-ray counts and the average is considered random noise. In the linear part, the unknown coefficients represent the intensites of the isotopes. Therefore, it is of great interest not to estimate each coefficient, but rather determine if the coefficient is non-zero, corresponding to the presence of the isotope. This thesis provides new selection algorithms, and, since detector data is undoubtedly finite, this unique work emphasizes selection when data is fixed and finite.
95

Statistical inference in high dimensional linear and AFT models

Chai, Hao 01 July 2014 (has links)
Variable selection procedures for high dimensional data have been proposed and studied by a large amount of literature in the last few years. Most of the previous research focuses on the selection properties as well as the point estimation properties. In this paper, our goal is to construct the confidence intervals for some low-dimensional parameters in the high-dimensional setting. The models we study are the partially penalized linear and accelerated failure time models in the high-dimensional setting. In our model setup, all variables are split into two groups. The first group consists of a relatively small number of variables that are more interesting. The second group consists of a large amount of variables that can be potentially correlated with the response variable. We propose an approach that selects the variables from the second group and produces confidence intervals for the parameters in the first group. We show the sign consistency of the selection procedure and give a bound on the estimation error. Based on this result, we provide the sufficient conditions for the asymptotic normality of the low-dimensional parameters. The high-dimensional selection consistency and the low-dimensional asymptotic normality are developed for both linear and AFT models with high-dimensional data.
96

Marginal false discovery rate approaches to inference on penalized regression models

Miller, Ryan 01 August 2018 (has links)
Data containing large number of variables is becoming increasingly more common and sparsity inducing penalized regression methods, such the lasso, have become a popular analysis tool for these datasets due to their ability to naturally perform variable selection. However, quantifying the importance of the variables selected by these models is a difficult task. These difficulties are compounded by the tendency for the most predictive models, for example those which were chosen using procedures like cross-validation, to include substantial amounts of noise variables with no real relationship with the outcome. To address the task of performing inference on penalized regression models, this thesis proposes false discovery rate approaches for a broad class of penalized regression models. This work includes the development of an upper bound for the number of noise variables in a model, as well as local false discovery rate approaches that quantify the likelihood of each individual selection being a false discovery. These methods are applicable to a wide range of penalties, such as the lasso, elastic net, SCAD, and MCP; a wide range of models, including linear regression, generalized linear models, and Cox proportional hazards models; and are also extended to the group regression setting under the group lasso penalty. In addition to studying these methods using numerous simulation studies, the practical utility of these methods is demonstrated using real data from several high-dimensional genome wide association studies.
97

Langages de description de systèmes logiques : propositions pour une méthode formelle de définition

Borrione, Dominique 01 July 1981 (has links) (PDF)
Réflexion théorique visant à dégager les principes communs à la très grande majorité des langages de description de systèmes logiques. Presentation de CONLAN. Exposé d'un modèle d'évaluation permettant de spécifier l'interprétation des primitives d'un langage de description de systèmes logiques.
98

Recovering Data with Group Sparsity by Alternating Direction Methods

Deng, Wei 06 September 2012 (has links)
Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting group sparsity will facilitate more efficient techniques for recovering large and complicated data in applications such as compressive sensing, statistics, signal and image processing, machine learning and computer vision. This thesis develops efficient algorithms for solving a class of optimization problems with group sparse solutions, where arbitrary group configurations are allowed and the mixed L21-regularization is used to promote group sparsity. Such optimization problems can be quite challenging to solve due to the mixed-norm structure and possible grouping irregularities. We derive algorithms based on a variable splitting strategy and the alternating direction methodology. Extensive numerical results are presented to demonstrate the efficiency, stability and robustness of these algorithms, in comparison with the previously known state-of-the-art algorithms. We also extend the existing global convergence theory to allow more generality.
99

Principal Components Analysis for Binary Data

Lee, Seokho 2009 May 1900 (has links)
Principal components analysis (PCA) has been widely used as a statistical tool for the dimension reduction of multivariate data in various application areas and extensively studied in the long history of statistics. One of the limitations of PCA machinery is that PCA can be applied only to the continuous type variables. Recent advances of information technology in various applied areas have created numerous large diverse data sets with a high dimensional feature space, including high dimensional binary data. In spite of such great demands, only a few methodologies tailored to such binary dataset have been suggested. The methodologies we developed are the model-based approach for generalization to binary data. We developed a statistical model for binary PCA and proposed two stable estimation procedures using MM algorithm and variational method. By considering the regularization technique, the selection of important variables is automatically achieved. We also proposed an efficient algorithm for model selection including the choice of the number of principal components and regularization parameter in this study.
100

Classification models for high-dimensional data with sparsity patterns

Tillander, Annika January 2013 (has links)
Today's high-throughput data collection devices, e.g. spectrometers and gene chips, create information in abundance. However, this poses serious statistical challenges, as the number of features is usually much larger than the number of observed units.  Further, in this high-dimensional setting, only a small fraction of the features are likely to be informative for any specific project. In this thesis, three different approaches to the two-class supervised classification in this high-dimensional, low sample setting are considered. There are classifiers that are known to mitigate the issues of high-dimensionality, e.g. distance-based classifiers such as Naive Bayes. However, these classifiers are often computationally intensive and therefore less time-consuming for discrete data. Hence, continuous features are often transformed into discrete features. In the first paper, a discretization algorithm suitable for high-dimensional data is suggested and compared with other discretization approaches. Further, the effect of discretization on misclassification probability in high-dimensional setting is evaluated.   Linear classifiers are more stable which motivate adjusting the linear discriminant procedure to high-dimensional setting. In the second paper, a two-stage estimation procedure of the inverse covariance matrix, applying Lasso-based regularization and Cuthill-McKee ordering is suggested. The estimation gives a block-diagonal approximation of the covariance matrix which in turn leads to an additive classifier. In the third paper, an asymptotic framework that represents sparse and weak block models is derived and a technique for block-wise feature selection is proposed.      Probabilistic classifiers have the advantage of providing the probability of membership in each class for new observations rather than simply assigning to a class. In the fourth paper, a method is developed for constructing a Bayesian predictive classifier. Given the block-diagonal covariance matrix, the resulting Bayesian predictive and marginal classifier provides an efficient solution to the high-dimensional problem by splitting it into smaller tractable problems. The relevance and benefits of the proposed methods are illustrated using both simulated and real data. / Med dagens teknik, till exempel spektrometer och genchips, alstras data i stora mängder. Detta överflöd av data är inte bara till fördel utan orsakar även vissa problem, vanligtvis är antalet variabler (p) betydligt fler än antalet observation (n). Detta ger så kallat högdimensionella data vilket kräver nya statistiska metoder, då de traditionella metoderna är utvecklade för den omvända situationen (p<n).  Dessutom är det vanligtvis väldigt få av alla dessa variabler som är relevanta för något givet projekt och styrkan på informationen hos de relevanta variablerna är ofta svag. Därav brukar denna typ av data benämnas som gles och svag (sparse and weak). Vanligtvis brukar identifiering av de relevanta variablerna liknas vid att hitta en nål i en höstack. Denna avhandling tar upp tre olika sätt att klassificera i denna typ av högdimensionella data.  Där klassificera innebär, att genom ha tillgång till ett dataset med både förklaringsvariabler och en utfallsvariabel, lära en funktion eller algoritm hur den skall kunna förutspå utfallsvariabeln baserat på endast förklaringsvariablerna. Den typ av riktiga data som används i avhandlingen är microarrays, det är cellprov som visar aktivitet hos generna i cellen. Målet med klassificeringen är att med hjälp av variationen i aktivitet hos de tusentals gener (förklaringsvariablerna) avgöra huruvida cellprovet kommer från cancervävnad eller normalvävnad (utfallsvariabeln). Det finns klassificeringsmetoder som kan hantera högdimensionella data men dessa är ofta beräkningsintensiva, därav fungera de ofta bättre för diskreta data. Genom att transformera kontinuerliga variabler till diskreta (diskretisera) kan beräkningstiden reduceras och göra klassificeringen mer effektiv. I avhandlingen studeras huruvida av diskretisering påverkar klassificeringens prediceringsnoggrannhet och en mycket effektiv diskretiseringsmetod för högdimensionella data föreslås. Linjära klassificeringsmetoder har fördelen att vara stabila. Nackdelen är att de kräver en inverterbar kovariansmatris och vilket kovariansmatrisen inte är för högdimensionella data. I avhandlingen föreslås ett sätt att skatta inversen för glesa kovariansmatriser med blockdiagonalmatris. Denna matris har dessutom fördelen att det leder till additiv klassificering vilket möjliggör att välja hela block av relevanta variabler. I avhandlingen presenteras även en metod för att identifiera och välja ut blocken. Det finns också probabilistiska klassificeringsmetoder som har fördelen att ge sannolikheten att tillhöra vardera av de möjliga utfallen för en observation, inte som de flesta andra klassificeringsmetoder som bara predicerar utfallet. I avhandlingen förslås en sådan Bayesiansk metod, givet den blockdiagonala matrisen och normalfördelade utfallsklasser. De i avhandlingen förslagna metodernas relevans och fördelar är visade genom att tillämpa dem på simulerade och riktiga högdimensionella data.

Page generated in 0.0647 seconds