• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 159
  • 45
  • 32
  • 16
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 311
  • 311
  • 79
  • 53
  • 52
  • 49
  • 44
  • 42
  • 42
  • 42
  • 35
  • 34
  • 32
  • 28
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
221

Aplicação de modelos teóricos de distribuição de abundância das espécies na avaliação de efeitos de fragmentação sobre as comunidades de aves da Mata Atlântica / Use of the species abundance distributions to evaluate the effects os fragmentation on the bird communities of Atlantic Forest

Camila Yumi Mandai 26 October 2010 (has links)
As distribuições de abundância relativa das espécies tiveram um papel importante no desenvolvimento da ecologia de comunidades, revelando um dos padrões mais bem estabelecidos da ecologia, que é a alta dominância de algumas espécies nas comunidades biológicas. Este padrão provocou a criação de dezenas de modelos teóricos na tentativa de explicar quais mecanismos ecológicos poderiam gerá-lo. Os modelos teóricos de abundância relativa das espécies podem ser vistos como descritores das comunidades, e seus parâmetros, medidas sintéticas de dimensões da diversidade. Esses parâmetros podem ser utilizados não só como descritores biologicamente interpretáveis das comunidades, mas também como variáveis respostas a possíveis fatores ambientais que afetam as comunidades. Adotando então esta aplicação descritiva dos modelos, nosso objetivo foi comparar as comunidades de aves de áreas em um gradiente de fragmentação, utilizando como variável resposta os valores estimados do parâmetro do modelo série logarítmica, o α de Fisher. Como todos os modelos teóricos de abundância relativa propostos têm como premissa, a igualdade de probabilidade de captura entre as espécies, o que para comunidades de espécies de organismos móveis, como aves, parece pouco realista, neste trabalho investigamos também o grau de sensibilidade dos modelos quanto à quebra dessa premissa. Assim, por meio de simulações de comunidades, analisamos o viés de seleção e estimação, e revelamos que o aumento do grau de heterogeneidade entre as probabilidades de captura das espécies acarreta no incremento do viés de seleção do modelo real e também de estimação dos parâmetros. Porém, como o objetivo do estudo era identificar os fatores que influenciam a diversidade das comunidades, mesmo com o viés de estimação, talvez ainda fosse possível revelar o grau de influência sobre os valores dos parâmetros, quando ele existir. Assim, prosseguimos com mais uma etapa de simulações, em que geramos comunidades cujos valores de parâmetros tinham uma relação linear com a área dos fragmentos. O que encontramos é que independente da igualdade ou desigualdade de capturabilidade das espécies, quando o efeito existe, ele é sempre detectado, porém dependendo do grau de diferença de probabilidade de captura das espécies, o efeito pode ser subestimado. E, na ausência de efeito, ele pode ser falsamente detectado, dependendo do grau de heterogeneidade de probabilidades de captura entre as espécies, mas sempre com estimativas bem baixas para o efeito inexistente. Com esses resultados então, pudemos quantificar os tipos de efeitos da heterogeneidade de probabilidades de captura e prosseguir com as análises dos efeitos de fragmentação. O que nossos resultados mostraram é que na paisagem com 10% de cobertura vegetal, a área parece influenciar a diversidade dos fragmentos mais que o isolamento, e que na paisagem de 50% de cobertura vegetal, a variável de isolamento se torna mais importante que a área para explicar os dados. Porém, em uma interpretação mais parcimoniosa, consideramos as estimativas dos efeitos muito baixas para considerar que ele de fato existia. Com isso, concluímos que o processo de fragmentação provavelmente não tem efeito sobre a hierarquia de abundância das espécies, e é independente da porcentagem de cobertura vegetal da paisagem. Contudo, em uma descrição do número de capturas de cada espécie nos fragmentos, ponderada pelo número de capturas amostrado em áreas contínuas adjacentes, revelaram que o tamanho do fragmento pode ser importante na determinação de quais espécies serão extintas ou beneficiadas e que talvez a qualidade da matriz seja decisiva para a manutenção de espécies altamente sensíveis em fragmentos pequenos. Assim, demonstramos que, embora as SADs sejam pouco afetadas pela fragmentação, a posição das espécies na hierarquia de abundâncias pode mudar muito, o que reflete as diferenças de sensibilidade das espécies a área e isolamento dos fragmentos. / Species abundance distribution (SADs) had an important role in community ecology, revealing one of the most well established pattern in ecology, which is the high dominance by just a few species. This pattern stimulated the proposal of innumerous theoretical models in an attempt to explain the ecological mechanism which could generate it. However these models can also be a descriptor of the communities and their parameters synthetic measures of diversity. Such parameters can be used as response variables to environmental impact affecting communities. Adopting this approach our objective was to compare bird communities through areas of different levels of fragmentation, using as response variable the estimates of α, the parameter of Fishers logseries. Considering the implicit assumption of equal capture probabilities among species in SAD models we also investigated the degree of sensibility of the models when this assumption is disrespected, once it seems so unrealistic. Thus simulating communities in which species had equal and different capture probabilities among them we found that increases in the degrees of heterogeneity in species catchability lead to a gain in biases on the model selection and parameters estimations. Additionally, since our goal in this study was identify some factors that may influence the diversity in communities, even with the biases, if they were constant, maybe it was still possible to test the relation. In this context we proceed to another stage of simulations, where we generate communities whose parameter values had a linear relationship with remnant area. What we find is that regardless of equal or unequal in catchability of species, when the effect exists, it is always detected, but depending on the degree of difference in probability of catching the species, the effect may be underestimated. Further, in the absence of effect, it can be falsely detected, depending on the degree of heterogeneity of capture probabilities among species, but always with very low estimates for the effect non-existent. With these results, we could quantify the types of effects of heterogeneity on capture probabilities and proceed with the analysis of the effects of fragmentation. What we showed is that the landscape with 10% vegetation cover, the fragment area appears to influence the diversity of the fragments rather than isolation, and landscape in 50% of plant cover, the isolation variable becomes more important than area to explain the data. But in a more parsimonious interpretation, we consider the estimated of the effects too low to consider that they actually exist. Therefore, we conclude that the fragmentation process probably has no effect on the hierarchy of species abundance. However, in a description of the number of captures of each species in the fragments, weighted by the number of catches sampled in continuous adjacent areas revealed that the fragment size may be important in determining which species will be extinct or benefit and that perhaps the quality of matrix is decisive for the maintenance of highly sensitive species in small fragments. Thus, we demonstrated that while the SAD are not significantly affected by fragmentation, the position in the hierarchy of species abundances can change a lot, which reflects the different sensitivity of species to area and isolation in the fragments.
222

Dinâmica da vegetação arbórea na borda de remanescentes florestais e sua relação com características da paisagem no norte do Estado do Paraná / Arboreal vegetation dynamics at forest edges and its relations with landscape features in the northern Paraná State

Bruno Rodrigues Ginciene 20 October 2014 (has links)
Os efeitos de borda e a alteração da estrutura das paisagens constituem consequências negativas da fragmentação florestal responsáveis por transformações nos processos ecológicos. Decorrentes da expansão desordenada de atividades antrópicas, estas alterações podem comprometer o futuro dos remanescentes florestais e a manutenção dos recursos naturais na superfície terrestre. Nesta dissertação a dinâmica da vegetação arbórea foi analisada em oito transectos perpendiculares às bordas de seis remanescentes florestais entre 1996 e 2012. As paisagens do entorno destes transectos foram caracterizadas a partir de imagens orbitais de 1995 e 2011 para a verificação das mudanças ocorridas no uso do solo e para a investigação da influência de seus parâmetros físicos e estruturais sobre as taxas de mortalidade e recrutamento de espécies. Os resultados indicaram que, ao longo do tempo, a influência das bordas se pronunciou em direção ao interior dos remanescentes florestais, enquanto que o contraste entre a borda e o interior se atenuou. A distância média da borda das espécies: pioneiras/iniciais, anemocóricas e de dossel foi significativamente maior em 2012 do que em 1996. A comunidade arbórea apresentou menor similaridade em sua composição ao longo do tempo a menores distâncias da borda. Apesar da dinâmica verificada no uso do solo, a proporcionalidade dos parâmetros físicos e estruturais das paisagens se manteve entre 1995 e 2011. De maneira geral, estes parâmetros apresentaram pouca influência sobre a dinâmica da comunidade arbórea. Apenas as taxas de mortalidade das espécies exóticas e as taxas de recrutamento das espécies pioneiras/inicias apresentam forte relação com o tamanho e o número dos fragmentos florestais nas paisagens. Estes resultados indicam que os efeitos de borda precisam ser atenuados e que o contexto das paisagens deve ser incorporado às estratégias conservacionistas para que estas sejam efetivas e o futuro dos remanescentes florestais não seja comprometido. / Edge effects and landscape structure alterations are among the negative consequences of forest fragmentation responsible for ecological process alterations on the earths surface. Originated from the disordered expansion of anthropogenic activities these alterations may endanger the remaining forest patches future and the maintenance of natural resources. This dissertation was pledged to analyze the vegetation dynamics at forest edges and its relations with landscape features. The vegetation dynamics was examined through eight perpendicular-to-edge transects within six forest patches and the alterations on the arboreal community distribution and composition were assessed between 1996 and 2012. The surrounding landscapes of the analyzed transects were characterized from 1995 and 2011 orbital images and its land use changes were evaluated. Landscape structure and physical parameters influence were analyzed over species recruitment and mortality. The results indicated that the distance of edge influence increased over time while its magnitude was attenuated. The average distance from the edge of pioneer/earlysuccessional species, wind-dispersed and canopy species in 2012 became significantly larger than in 1996. Over time lower similarities in species composition were found to be closer to the edges. Although the observed land use changes in the surrounding landscapes of the edge transects landscape structure and physical parameters proportionality was maintained between 1995 and 2011. Overall the arboreal community dynamics were poorly associated with landscape features. A strong relation of the variables was only found between the exotic and pioneer/early-successional species mortality and recruitment and the size and the amount of forest patches within the landscapes. These results indicate that to be effective conservation planning must tackled edge effects and incorporate the landscape context otherwise they will fail for the maintenance of the future of forest patches.
223

Model selection for discrete Markov random fields on graphs / Seleção de modelos para campos aleatórios Markovianos discretos sobre grafos

Iara Moreira Frondana 28 June 2016 (has links)
In this thesis we propose to use a penalized maximum conditional likelihood criterion to estimate the graph of a general discrete Markov random field. We prove the almost sure convergence of the estimator of the graph in the case of a finite or countable infinite set of variables. Our method requires minimal assumptions on the probability distribution and contrary to other approaches in the literature, the usual positivity condition is not needed. We present several examples with a finite set of vertices and study the performance of the estimator on simulated data from theses examples. We also introduce an empirical procedure based on k-fold cross validation to select the best value of the constant in the estimators definition and show the application of this method in two real datasets. / Nesta tese propomos um critério de máxima verossimilhança penalizada para estimar o grafo de dependência condicional de um campo aleatório Markoviano discreto. Provamos a convergência quase certa do estimador do grafo no caso de um conjunto finito ou infinito enumerável de variáveis. Nosso método requer condições mínimas na distribuição de probabilidade e contrariamente a outras abordagens da literatura, a condição usual de positividade não é necessária. Introduzimos alguns exemplos com um conjunto finito de vértices e estudamos o desempenho do estimador em dados simulados desses exemplos. Também propomos um procedimento empírico baseado no método de validação cruzada para selecionar o melhor valor da constante na definição do estimador, e mostramos a aplicação deste procedimento em dois conjuntos de dados reais.
224

Strategies for Combining Tree-Based Ensemble Models

Zhang, Yi 01 January 2017 (has links)
Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.
225

Bayesian inference in aggregated hidden Markov models

Marklund, Emil January 2015 (has links)
Single molecule experiments study the kinetics of molecular biological systems. Many such studies generate data that can be described by aggregated hidden Markov models, whereby there is a need of doing inference on such data and models. In this study, model selection in aggregated Hidden Markov models was performed with a criterion of maximum Bayesian evidence. Variational Bayes inference was seen to underestimate the evidence for aggregated model fits. Estimation of the evidence integral by brute force Monte Carlo integration theoretically always converges to the correct value, but it converges in far from tractable time. Nested sampling is a promising method for solving this problem by doing faster Monte Carlo integration, but it was here seen to have difficulties generating uncorrelated samples.
226

Approche pour la construction de modèles d'estimation réaliste de l'effort/coût de projet dans un environnement incertain : application au domaine du développement logiciel / Approach to build realistic models for estimating project effort/cost in an uncertain environment : application to the software development field

Laqrichi, Safae 17 December 2015 (has links)
L'estimation de l'effort de développement logiciel est l'une des tâches les plus importantes dans le management de projets logiciels. Elle constitue la base pour la planification, le contrôle et la prise de décision. La réalisation d'estimations fiables en phase amont des projets est une activité complexe et difficile du fait, entre autres, d'un manque d'informations sur le projet et son avenir, de changements rapides dans les méthodes et technologies liées au domaine logiciel et d'un manque d'expérience avec des projets similaires. De nombreux modèles d'estimation existent, mais il est difficile d'identifier un modèle performant pour tous les types de projets et applicable à toutes les entreprises (différents niveaux d'expérience, technologies maitrisées et pratiques de management de projet). Globalement, l'ensemble de ces modèles formule l'hypothèse forte que (1) les données collectées sont complètes et suffisantes, (2) les lois reliant les paramètres caractérisant les projets sont parfaitement identifiables et (3) que les informations sur le nouveau projet sont certaines et déterministes. Or, dans la réalité du terrain cela est difficile à assurer. Deux problématiques émergent alors de ces constats : comment sélectionner un modèle d'estimation pour une entreprise spécifique ? et comment conduire une estimation pour un nouveau projet présentant des incertitudes ? Les travaux de cette thèse s'intéressent à répondre à ces questions en proposant une approche générale d'estimation. Cette approche couvre deux phases : une phase de construction du système d'estimation et une phase d'utilisation du système pour l'estimation de nouveaux projets. La phase de construction du système d'estimation est composée de trois processus : 1) évaluation et comparaison fiable de différents modèles d'estimation, et sélection du modèle d'estimation le plus adéquat, 2) construction d'un système d'estimation réaliste à partir du modèle d'estimation sélectionné et 3) utilisation du système d'estimation dans l'estimation d'effort de nouveaux projets caractérisés par des incertitudes. Cette approche intervient comme un outil d'aide à la décision pour les chefs de projets dans l'aide à l'estimation réaliste de l'effort, des coûts et des délais de leurs projets logiciels. L'implémentation de l'ensemble des processus et pratiques développés dans le cadre de ces travaux ont donné naissance à un prototype informatique open-source. Les résultats de cette thèse s'inscrivent dans le cadre du projet ProjEstimate FUI13. / Software effort estimation is one of the most important tasks in the management of software projects. It is the basis for planning, control and decision making. Achieving reliable estimates in projects upstream phases is a complex and difficult activity because, among others, of the lack of information about the project and its future, the rapid changes in the methods and technologies related to the software field and the lack of experience with similar projects. Many estimation models exist, but it is difficult to identify a successful model for all types of projects and that is applicable to all companies (different levels of experience, mastered technologies and project management practices). Overall, all of these models form the strong assumption that (1) the data collected are complete and sufficient, (2) laws linking the parameters characterizing the projects are fully identifiable and (3) information on the new project are certain and deterministic. However, in reality on the ground, that is difficult to be ensured.Two problems then emerge from these observations: how to select an estimation model for a specific company ? and how to conduct an estimate for a new project that presents uncertainties ?The work of this thesis interested in answering these questions by proposing a general estimation framework. This framework covers two phases: the construction phase of the estimation system and system usage phase for estimating new projects. The construction phase of the rating system consists of two processes: 1) evaluation and reliable comparison of different estimation models then selection the most suitable estimation model, 2) construction of a realistic estimation system from the selected estimation model and 3) use of the estimation system in estimating effort of new projects that are characterized by uncertainties. This approach acts as an aid to decision making for project managers in supporting the realistic estimate of effort, cost and time of their software projects. The implementation of all processes and practices developed as part of this work has given rise to an open-source computer prototype. The results of this thesis fall in the context of ProjEstimate FUI13 project.
227

Estimation par tests / Estimation via testing

Sart, Mathieu 25 November 2013 (has links)
Cette thèse porte sur l'estimation de fonctions à l'aide de tests dans trois cadres statistiques différents. Nous commençons par étudier le problème de l'estimation des intensités de processus de Poisson avec covariables. Nous démontrons un théorème général de sélection de modèles et en déduisons des bornes de risque non-asymptotiques sous des hypothèses variées sur la fonction à estimer. Nous estimons ensuite la densité de transition d'une chaîne de Markov homogène et proposons pour cela deux procédures. La première, basée sur la sélection d'estimateurs constants par morceaux, permet d'établir une inégalité de type oracle sous des hypothèses minimales sur la chaîne de Markov. Nous en déduisons des vitesses de convergence uniformes sur des boules d'espaces de Besov inhomogènes et montrons que l'estimateur est adaptatif par rapport à la régularité de la densité de transition. La performance de l'estimateur est aussi évalué en pratique grâce à des simulations numériques. La seconde procédure peut difficilement être implémenté en pratique mais permet d'obtenir un résultat général de sélection de modèles et d'en déduire des vitesses de convergence sous des hypothèses plus générales sur la densité de transition. Finalement, nous proposons un nouvel estimateur paramétrique d'une densité. Son risque est contrôlé sous des hypothèses pour lesquelles la méthode du maximum de vraisemblance peut ne pas fonctionner. Les simulations montrent que ces deux estimateurs sont très proches lorsque le modèle est vrai et suffisamment régulier. Il est cependant robuste, contrairement à l'estimateur du maximum de vraisemblance. / This thesis deals with the estimation of functions from tests in three statistical settings. We begin by studying the problem of estimating the intensities of Poisson processes with covariates. We prove a general model selection theorem from which we derive non-asymptotic risk bounds under various assumptions on the target function. We then propose two procedures to estimate the transition density of an homogeneous Markov chain. The first one selects an estimator among a collection of piecewise constant estimators. The selected estimator is shown to satisfy an oracle-type inequality under minimal assumptions on the Markov chain which allows us to deduce uniform rates of convergence over balls of inhomogeneous Besov spaces. Besides, the estimator is adaptive with respect to the smoothness of the transition density. We also evaluate the performance of the estimator in practice by carrying out numerical simulations. The second procedure is only of theoretical interest but yields a general model selection theorem from which we derive rates of convergence under more general assumptions on the transition density. Finally, we propose a new parametric estimator of a density. We upper-bound its risk under assumptions for which the maximum likelihood method may not work. The simulations show that these two estimators are very close when the model is true and regular enough. However, contrary to the maximum likelihood estimator, this estimator is robust.
228

Novel pharmacometric methods to improve clinical drug development in progressive diseases / Place de nouvelles approches pharmacométriques pour optimiser le développement clinique des médicaments dans le secteur des maladies progressives

Buatois, Simon 26 November 2018 (has links)
Suite aux progrès techniques et méthodologiques dans le secteur de la modélisation, l’apport de ces approches est désormais reconnu par l’ensemble des acteurs de la recherche clinique et pourrait avoir un rôle clé dans la recherche sur les maladies progressives. Parmi celles-ci les études pharmacométriques (PMX) sont rarement utilisées pour répondre aux hypothèses posées dans le cadre d’études dites de confirmation. Parmi les raisons évoquées, les analyses PMX traditionnelles ignorent l'incertitude associée à la structure du modèle lors de la génération d'inférence statistique. Or, ignorer l’étape de sélection du modèle peut aboutir à des intervalles de confiance trop optimistes et à une inflation de l’erreur de type I. Pour y remédier, nous avons étudié l’apport d’approches PMX innovantes dans les études de choix de dose. Le « model averaging » couplée à un test du rapport de « vraisemblance combiné » a montré des résultats prometteurs et tend à promouvoir l’utilisation de la PMX dans les études de choix de dose. Pour les études dites d’apprentissage, les approches de modélisation sont utilisées pour accroitre les connaissances associées aux médicaments, aux mécanismes et aux maladies. Dans cette thèse, les mérites de l’analyse PMX ont été évalués dans le cadre de la maladie de Parkinson. En combinant la théorie des réponses aux items à un modèle longitudinal, l’analyse PMX a permis de caractériser adéquatement la progression de la maladie tout en tenant compte de la nature composite du biomarqueur. Pour conclure, cette thèse propose des méthodes d’analyses PMX innovantes pour faciliter le développement des médicaments et/ou les décisions des autorités réglementaires. / In the mid-1990, model-based approaches were mainly used as supporting tools for drug development. Restricted to the “rescue mode” in situations of drug development failure, the impact of model-based approaches was relatively limited. Nowadays, the merits of these approaches are widely recognised by stakeholders in healthcare and have a crucial role in drug development for progressive diseases. Despite their numerous advantages, model-based approaches present important drawbacks limiting their use in confirmatory trials. Traditional pharmacometric (PMX) analyses relies on model selection, and consequently ignores model structure uncertainty when generating statistical inference. The problem of model selection is potentially leading to over-optimistic confidence intervals and resulting in a type I error inflation. Two projects of this thesis aimed at investigating the value of innovative PMX approaches to address part of these shortcomings in a hypothetical dose-finding study for a progressive disorder. The model averaging approach coupled to a combined likelihood ratio test showed promising results and represents an additional step towards the use of PMX for primary analysis in dose-finding studies. In the learning phase, PMX is a key discipline with applications at every stage of drug development to gain insight into drug, mechanism and disease characteristics with the ultimate goal to aid efficient drug development. In this thesis, the merits of PMX analysis were evaluated, in the context of Parkinson’s disease. An item-response theory longitudinal model was successfully developed to precisely describe the disease progression of Parkinson’s disease patients while acknowledging the composite nature of a patient-reported outcome. To conclude, this thesis enhances the use of PMX to aid efficient drug development and/or regulatory decisions in drug development.
229

Sélection de modèles parcimonieux pour l’apprentissage statistique en grande dimension / Model selection for sparse high-dimensional learning

Mattei, Pierre-Alexandre 26 October 2017 (has links)
Le déferlement numérique qui caractérise l’ère scientifique moderne a entraîné l’apparition de nouveaux types de données partageant une démesure commune : l’acquisition simultanée et rapide d’un très grand nombre de quantités observables. Qu’elles proviennent de puces ADN, de spectromètres de masse ou d’imagerie par résonance nucléaire, ces bases de données, qualifiées de données de grande dimension, sont désormais omniprésentes, tant dans le monde scientifique que technologique. Le traitement de ces données de grande dimension nécessite un renouvellement profond de l’arsenal statistique traditionnel, qui se trouve inadapté à ce nouveau cadre, notamment en raison du très grand nombre de variables impliquées. En effet, confrontée aux cas impliquant un plus grand nombre de variables que d’observations, une grande partie des techniques statistiques classiques est incapable de donner des résultats satisfaisants. Dans un premier temps, nous introduisons les problèmes statistiques inhérents aux modelés de données de grande dimension. Plusieurs solutions classiques sont détaillées et nous motivons le choix de l’approche empruntée au cours de cette thèse : le paradigme bayésien de sélection de modèles. Ce dernier fait ensuite l’objet d’une revue de littérature détaillée, en insistant sur plusieurs développements récents. Viennent ensuite trois chapitres de contributions nouvelles à la sélection de modèles en grande dimension. En premier lieu, nous présentons un nouvel algorithme pour la régression linéaire bayésienne parcimonieuse en grande dimension, dont les performances sont très bonnes, tant sur données réelles que simulées. Une nouvelle base de données de régression linéaire est également introduite : il s’agit de prédire la fréquentation du musée d’Orsay à l’aide de données vélibs. Ensuite, nous nous penchons sur le problème de la sélection de modelés pour l’analyse en composantes principales (ACP). En nous basant sur un résultat théorique nouveau, nous effectuons les premiers calculs exacts de vraisemblance marginale pour ce modelé. Cela nous permet de proposer deux nouveaux algorithmes pour l’ACP parcimonieuse, un premier, appelé GSPPCA, permettant d’effectuer de la sélection de variables, et un second, appelé NGPPCA, permettant d’estimer la dimension intrinsèque de données de grande dimension. Les performances empiriques de ces deux techniques sont extrêmement compétitives. Dans le cadre de données d’expression ADN notamment, l’approche de sélection de variables proposée permet de déceler sans supervision des ensembles de gènes particulièrement pertinents. / The numerical surge that characterizes the modern scientific era led to the rise of new kinds of data united in one common immoderation: the simultaneous acquisition of a large number of measurable quantities. Whether coming from DNA microarrays, mass spectrometers, or nuclear magnetic resonance, these data, usually called high-dimensional, are now ubiquitous in scientific and technological worlds. Processing these data calls for an important renewal of the traditional statistical toolset, unfit for such frameworks that involve a large number of variables. Indeed, when the number of variables exceeds the number of observations, most traditional statistics becomes inefficient. First, we give a brief overview of the statistical issues that arise with high-dimensional data. Several popular solutions are presented, and we present some arguments in favor of the method utilized and advocated in this thesis: Bayesian model uncertainty. This chosen framework is the subject of a detailed review that insists on several recent developments. After these surveys come three original contributions to high-dimensional model selection. A new algorithm for high-dimensional sparse regression called SpinyReg is presented. It compares favorably to state-of-the-art methods on both real and synthetic data sets. A new data set for high-dimensional regression is also described: it involves predicting the number of visitors in the Orsay museum in Paris using bike-sharing data. We focus next on model selection for high-dimensional principal component analysis (PCA). Using a new theoretical result, we derive the first closed-form expression of the marginal likelihood of a PCA model. This allows us to propose two algorithms for model selection in PCA. A first one called globally sparse probabilistic PCA (GSPPCA) that allows to perform scalable variable selection, and a second one called normal-gamma probabilistic PCA (NGPPCA) that estimates the intrinsic dimensionality of a high-dimensional data set. Both methods are competitive with other popular approaches. In particular, using unlabeled DNA microarray data, GSPPCA is able to select genes that are more biologically relevant than several popular approaches.
230

Logistic Regression for Prospectivity Modeling

Kost, Samuel 02 December 2020 (has links)
The thesis proposes a method for automated model selection using a logistic regression model in the context of prospectivity modeling, i.e. the exploration of minearlisations. This kind of data is characterized by a rare positive event and a large dataset. We adapted and combined the two statistical measures Wald statistic and Bayes' information criterion making it suitable for the processing of large data and a high number of variables that emerge in the nonlinear setting of logistic regression. The obtained models of our suggested method are parsimonious allowing for an interpretation and information gain. The advantages of our method are shown by comparing it to another model selection method and to arti cial neural networks on several datasets. Furthermore we introduced a possibility to induce spatial dependencies which are important in such geological settings.

Page generated in 0.083 seconds