Global ETD Search

91	Communication et statistiques publiques. Représentations dominantes / Communication and public statistics. Main representations Jucobin, Anne-Claire 14 December 2009 (has links) Outils dominants de représentation du monde social, les statistiques publiques jouent en France un rôledéterminant dans la revendication actuelle de rationalisation de l’action de l’Etat. A partir de l’étude dumode de publicité des statistiques de la délinquance et de la criminalité, ce travail analyse la place de cesdonnées chiffrées dans les stratégies de communication publique. Il apparaît ainsi que cetteprééminence tient à la fois au régime sémiotique spécifique des chiffres, à l’importance du quantitatifdans l’imaginaire individuel et collectif, autant qu’à une combinaison de légitimations (historiques,éthiques, politiques, scientifiques). Mais les rapports de pouvoir en jeu se définissent également par lamanipulation des signes en vue d’imposer un système de valeurs. Ils se distinguent par la force d’uneesthétique qui parvient à concilier complexité du savoir et apparente évidence de l’information,objectivité et subjectivité, opacité des instances auctoriales et transparence revendiquée. / As the prevailing tools of the representation of the social world, official statistics in France play akey role in the current claim to rationalise government actions. Studying the communicationalchoices concerning criminal statistics in France, this work emphasizes the role of statistics in thestrategies of public communication: their importance lies in the semiotic nature of figures, theinfluence of quantitative matters in the individual and collective imagination, as well as acombination of legitimacy (historical, ethical, political, scientific). But the balance of powersinvolved is also defined by the manipulation of signs in order to establish a system of values. Theseare distinguished by the strength of an aesthetic, which strikes a balance between complexity ofknowledge and apparent evidence of information, objectivity and subjectivity, opacity of auctorialauthorities and claimed transparency. Communication publique Statistiques Esthétique Représentations Public communication Statistics Aesthetics Representations
92	Statistiques multivariées pour l'analyse du risque alimentaire / Multivariate statistics for dietary risk analysis Chautru, Emilie 06 September 2013 (has links) Véritable carrefour de problématiques économiques, biologiques, sociologiques, culturelles et sanitaires, l’alimentation suscite de nombreuses polémiques. Dans un contexte où les échanges mondiaux facilitent le transport de denrées alimentaires produites dans des conditions environnementales diverses, où la consommation de masse encourage les stratégies visant à réduire les coûts et maximiser le volume de production (OGM, pesticides, etc.) il devient nécessaire de quantifier les risques sanitaires que de tels procédés engendrent. Notre intérêt se place ici sur l’étude de l’exposition chronique, de l’ordre de l’année, à un ensemble de contaminants dont la nocivité à long terme est d’ores et déjà établie. Les dangers et bénéfices de l’alimentation ne se restreignant pas à l’ingestion ou non de substances toxiques, nous ajoutons à nos objectifs l’étude de certains apports nutritionnels. Nos travaux se centrent ainsi autour de trois axes principaux. Dans un premier temps, nous nous intéressons à l'analyse statistique des très fortes expositions chroniques à une ou plusieurs substances chimiques, en nous basant principalement sur des résultats issus de la théorie des valeurs extrêmes. Nous adaptons ensuite des méthodes d'apprentissage statistique de type ensembles de volume minimum pour l'identification de paniers de consommation réalisant un compromis entre risque toxicologique et bénéfice nutritionnel. Enfin, nous étudions les propriétés asymptotiques d'un certain nombre d'estimateurs permettant d'évaluer les caractéristiques de l'exposition, qui prennent en compte le plan de sondage utilisé pour collecter les données. / At a crossroads of economical, sociological, cultural and sanitary issues, dietary analysis is of major importance for public health institutes. When international trade facilitates the transportation of foodstuffs produced in very different environmental conditions, when conspicuous consumption encourages profitable strategies (GMO, pesticides, etc.), it is necessary to quantify the sanitary risks engendered by such economic behaviors. We are interested in the evaluation of chronic types of exposure (at a yearly scale) to food contaminants, the long-term toxicity of which is already well documented. Because dietary risk and benefit is not limited to the abuse or the avoidance of toxic substances, nutritional intakes are also considered. Our work is thus organized along three main lines of research. We first consider the statistical analysis of very high long-term types of exposure to one or more chemical elements present in the food, adopting approaches in keeping with extreme value theory. Then, we adapt classical techniques borrowed from the statistical learning field concerning minimum volume set estimation in order to identify dietary habits that realize a compromise between toxicological risk and nutritional benefit. Finally, we study the asymptotic properties of a number of statistics that can assess the characteristics of the distribution of individual exposure, which take into account the possible survey scheme from which the data originate. Apports nutritionnels de long terme Mesure spectrale Théorie des sondages Processus empiriques Ensembles de volume minimum U-statistiques Risque-bénéfice Usual intakes Spectral measure Survey sampling Empirical processes Tail index estimation Minimum volume sets U-statistics Risk-benefit
93	Adaptation des méthodes d’apprentissage aux U-statistiques / Adapting machine learning methods to U-statistics Colin, Igor 24 November 2016 (has links) L’explosion récente des volumes de données disponibles a fait de la complexité algorithmique un élément central des méthodes d’apprentissage automatique. Les algorithmes d’optimisation stochastique ainsi que les méthodes distribuées et décentralisées ont été largement développés durant les dix dernières années. Ces méthodes ont permis de faciliter le passage à l’échelle pour optimiser des risques empiriques dont la formulation est séparable en les observations associées. Pourtant, dans de nombreux problèmes d’apprentissage statistique, l’estimation précise du risque s’effectue à l’aide de U-statistiques, des fonctions des données prenant la forme de moyennes sur des d-uplets. Nous nous intéressons tout d’abord au problème de l’échantillonnage pour la minimisation du risque empirique. Nous montrons que le risque peut être remplacé par un estimateur de Monte-Carlo, intitulé U-statistique incomplète, basé sur seulement O(n) termes et permettant de conserver un taux d’apprentissage du même ordre. Nous établissons des bornes sur l’erreur d’approximation du U-processus et les simulations numériques mettent en évidence l’avantage d’une telle technique d’échantillonnage. Nous portons par la suite notre attention sur l’estimation décentralisée, où les observations sont désormais distribuées sur un réseau connexe. Nous élaborons des algorithmes dits gossip, dans des cadres synchrones et asynchrones, qui diffusent les observations tout en maintenant des estimateurs locaux de la U-statistique à estimer. Nous démontrons la convergence de ces algorithmes avec des dépendances explicites en les données et la topologie du réseau. Enfin, nous traitons de l’optimisation décentralisée de fonctions dépendant de paires d’observations. De même que pour l’estimation, nos méthodes sont basées sur la concomitance de la propagation des observations et l’optimisation local du risque. Notre analyse théorique souligne que ces méthodes conservent une vitesse de convergence du même ordre que dans le cas centralisé. Les expériences numériques confirment l’intérêt pratique de notre approche. / With the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach. U-statistique Gossip Optimisation décentralisée Graphe U-statistic Gossip Decentralized optimization Graph
94	Contribution à la statistique spatiale et l'analyse de données fonctionnelles / Contribution to spatial statistics and functional data analysis Ahmed, Mohamed Salem 12 December 2017 (has links) Ce mémoire de thèse porte sur la statistique inférentielle des données spatiales et/ou fonctionnelles. En effet, nous nous sommes intéressés à l’estimation de paramètres inconnus de certains modèles à partir d’échantillons obtenus par un processus d’échantillonnage aléatoire ou non (stratifié), composés de variables indépendantes ou spatialement dépendantes.La spécificité des méthodes proposées réside dans le fait qu’elles tiennent compte de la nature de l’échantillon étudié (échantillon stratifié ou composé de données spatiales dépendantes).Tout d’abord, nous étudions des données à valeurs dans un espace de dimension infinie ou dites ”données fonctionnelles”. Dans un premier temps, nous étudions les modèles de choix binaires fonctionnels dans un contexte d’échantillonnage par stratification endogène (échantillonnage Cas-Témoin ou échantillonnage basé sur le choix). La spécificité de cette étude réside sur le fait que la méthode proposée prend en considération le schéma d’échantillonnage. Nous décrivons une fonction de vraisemblance conditionnelle sous l’échantillonnage considérée et une stratégie de réduction de dimension afin d’introduire une estimation du modèle par vraisemblance conditionnelle. Nous étudions les propriétés asymptotiques des estimateurs proposées ainsi que leurs applications à des données simulées et réelles. Nous nous sommes ensuite intéressés à un modèle linéaire fonctionnel spatial auto-régressif. La particularité du modèle réside dans la nature fonctionnelle de la variable explicative et la structure de la dépendance spatiale des variables de l’échantillon considéré. La procédure d’estimation que nous proposons consiste à réduire la dimension infinie de la variable explicative fonctionnelle et à maximiser une quasi-vraisemblance associée au modèle. Nous établissons la consistance, la normalité asymptotique et les performances numériques des estimateurs proposés.Dans la deuxième partie du mémoire, nous abordons des problèmes de régression et prédiction de variables dépendantes à valeurs réelles. Nous commençons par généraliser la méthode de k-plus proches voisins (k-nearest neighbors; k-NN) afin de prédire un processus spatial en des sites non-observés, en présence de co-variables spatiaux. La spécificité du prédicteur proposé est qu’il tient compte d’une hétérogénéité au niveau de la co-variable utilisée. Nous établissons la convergence presque complète avec vitesse du prédicteur et donnons des résultats numériques à l’aide de données simulées et environnementales.Nous généralisons ensuite le modèle probit partiellement linéaire pour données indépendantes à des données spatiales. Nous utilisons un processus spatial linéaire pour modéliser les perturbations du processus considéré, permettant ainsi plus de flexibilité et d’englober plusieurs types de dépendances spatiales. Nous proposons une approche d’estimation semi paramétrique basée sur une vraisemblance pondérée et la méthode des moments généralisées et en étudions les propriétés asymptotiques et performances numériques. Une étude sur la détection des facteurs de risque de cancer VADS (voies aéro-digestives supérieures)dans la région Nord de France à l’aide de modèles spatiaux à choix binaire termine notre contribution. / This thesis is about statistical inference for spatial and/or functional data. Indeed, weare interested in estimation of unknown parameters of some models from random or nonrandom(stratified) samples composed of independent or spatially dependent variables.The specificity of the proposed methods lies in the fact that they take into considerationthe considered sample nature (stratified or spatial sample).We begin by studying data valued in a space of infinite dimension or so-called ”functionaldata”. First, we study a functional binary choice model explored in a case-controlor choice-based sample design context. The specificity of this study is that the proposedmethod takes into account the sampling scheme. We describe a conditional likelihoodfunction under the sampling distribution and a reduction of dimension strategy to definea feasible conditional maximum likelihood estimator of the model. Asymptotic propertiesof the proposed estimates as well as their application to simulated and real data are given.Secondly, we explore a functional linear autoregressive spatial model whose particularityis on the functional nature of the explanatory variable and the structure of the spatialdependence. The estimation procedure consists of reducing the infinite dimension of thefunctional variable and maximizing a quasi-likelihood function. We establish the consistencyand asymptotic normality of the estimator. The usefulness of the methodology isillustrated via simulations and an application to some real data.In the second part of the thesis, we address some estimation and prediction problemsof real random spatial variables. We start by generalizing the k-nearest neighbors method,namely k-NN, to predict a spatial process at non-observed locations using some covariates.The specificity of the proposed k-NN predictor lies in the fact that it is flexible and allowsa number of heterogeneity in the covariate. We establish the almost complete convergencewith rates of the spatial predictor whose performance is ensured by an application oversimulated and environmental data. In addition, we generalize the partially linear probitmodel of independent data to the spatial case. We use a linear process for disturbancesallowing various spatial dependencies and propose a semiparametric estimation approachbased on weighted likelihood and generalized method of moments methods. We establishthe consistency and asymptotic distribution of the proposed estimators and investigate thefinite sample performance of the estimators on simulated data. We end by an applicationof spatial binary choice models to identify UADT (Upper aerodigestive tract) cancer riskfactors in the north region of France which displays the highest rates of such cancerincidence and mortality of the country. Modèle à choix binaire Analyses de données fonctionnelles ´Echantillonnage basé sur le choix ´Echantillonnage Cas-Témoin Modèle linéaire fonctionnel Processus auto-régressif spatial Quasi-maximum de vraisemblance Statistique Non-paramétrique Régression, Prédiction K-plus proches voisins Estimateur à Noyau Processus spatial Econométrie spatiale Estimation Semi-paramétrique Méthodes des moments généralisées Binary choice model Functional data analysis Choice-based sampling Case-control Functional Linear Model Spatial Autoregressive Process Quasi-maximum likelihood estimator Nonparametric statistics Regression Prediction K-nearest neighbors Kernel estimate Spatial process Spatial econometrics Semi-parametric estimation Generalized method of moments
95	Estatística em bioequivalência: garantia na qualidade do medicamento genérico / Statistics on Bioequivalence: Guarantee in quality of generic drug Souza, Roberto Molina de 16 February 2009 (has links) SOUZA, R. M. \\Estatstica em Bioequivaência: Garantia na qualidade do medicamento generico\". 2008. 42 f Dissertação (Mestrado em Saude na Comunidade) Faculdade de Medicina de Ribeir~ao Preto - USP Como alternativa aos medicamentos de uso humano de grande circulação no mercado brasileiro foram regulamentados os medicamentos genericos, conforme a Lei dos genericos no 9787/99, que evidenciaram os estudos de bioequivalência e biodisponibilidade no Brasil com o objetivo de avaliar a bioequivalência das formulações genericas, tomando-se como referências os medicamentos ja existentes no mercado e com eficacia comprovada. Duas formulações de um mesmo medicamento são consideradas bioequivalentes se suas biodisponibilidades não apresentam evidências de diferenças signicativas segundo limites clinicamente especificados, denominados limites de bioequivalência. Os estudos de bioequivalência são realizados mediante a administração de duas formulações, sendo que uma esta em teste e a outra e a referência, em um numero de voluntários previamente denidos, usando-se um planejamento experimental, na maioria das vêzes do tipo crossover. Apos a retirada de sucessivas amostras sanguíneas ou urinárias em tempos pre-determinados, estudam-se alguns parâmetros farmacocinéticos como area sob a curva de concentrac~ao, concentrac~ao maxima do farmaco e tempo em que a concentração ao maxima ocorre. Esta dissertação de mestrado introduz alguns conceitos basicos de bioequivalênncia para, logo em seguida, apresentar analises Bayesianas para medidas de bioequivalência tanto univariada como multivariada assumindo a distribuição ao normal multivariada para os dados e também a distribuição de Student multivariada. Uma aplicação a de exemplicar o que foi introduzido e apresentada e, para o conjunto de dados em estudo têm, por meio de criterios de seleção ao de modelos, evidências favoraveis a escolha dos modelos multivariados para a condução deste estudo de bioequivalência media. / SOUZA, R. M. \\Statistics on Bioequivalence: Guarantee in quality of generic drug\". 2008. 42 s Dissertation (Master Degree) Faculdade de Medicina de Ribeir~ao Preto - USP As an alternative to medicines for human use of great movement in Brazil, the use of generic medicines were regulated, according to the law of the generic no 9787/99, which establish the studies of bioavailability and bioequivalence in Brazil in order to evaluate bioequivalence of generic formulations, considering as reference existing medicinal products, with proved ecacy. Two formulations of the same drug are considered bioequivalents if your bioavailability do not present evidence of signicant dierences according to clinically specied limits known as bioequivalence limits. Bioequivalence studies are carried out by the administration of two formulations (one is in test and the other one is the reference) in a pre-dened number of volunteers using an experimental plan that is often the crossover one. After the withdrawn of successive blood or urinary samples in predetermined intervals, some pharmacokinetic parameters were studied, such as area under concentration curve, maximum concentration of drug and time that the maximum concentration occurs. This dissertation introduces some basic concepts of bioequivalence and following that, it is presented Bayesian analysis for both as univariate and as multivariate bioequivalence measures assuming the multivariate normal distribution for the data and also the distribution of multivariate t student distribution. An application in order to illustrate what was introduced is presented in this work, and by using means of selection criteria of models, it was observed that for all data on study, there were evidences that lead to choose the multivariates models in order to conduct this study of average bioequivalence. average bioequivalence Bayesian inference bioequivalência media etapa Estatística generic drug. inferência bayesiana medicamento generico. step Statistics
96	Modelos de custo e estatísticas para consultas por similaridade / Cost models and statistics for similarity searching Bêdo, Marcos Vinícius Naves 10 October 2017 (has links) Consultas por similaridade constituem um paradigma de busca que fornece suporte à diversas tarefas computacionais, tais como agrupamento, classificação e recuperação de informação. Neste contexto, medir a similaridade entre objetos requer comparar a distância entre eles, o que pode ser formalmente modelado pela teoria de espaços métricos. Recentemente, um grande esforço de pesquisa tem sido dedicado à inclusão de consultas por similaridade em Sistemas Gerenciadores de Bases de Dados (SGBDs), com o objetivo de (i) permitir a combinação de comparações por similaridade com as comparações por identidade e ordem já existentes em SGBDs e (ii) obter escalabilidade para grandes bases de dados. Nesta tese, procuramos dar um próximo passo ao estendermos também o otimizador de consultas de um SGBD. Em particular, propomos a ampliação de dois módulos do otimizador: o módulo de Espaço de Distribuição de Dados e o módulo de Modelo de Custo. Ainda que o módulo de Espaço de Distribuição de Dados permita representar os dados armazenados, essas representações são insuficientes para modelar o comportamento das comparações em espaços métricos, sendo necessário estender este módulo para contemplar distribuições de distância. De forma semelhante, o módulo Modelo de Custo precisa ser ampliado para dar suporte à modelos de custo que utilizem estimativas sobre distribuições de distância. Toda a investigação aqui conduzida se concentra em cinco contribuições. Primeiro, foi criada uma nova sinopse para distribuições de distância, o Histograma Compactado de Distância (CDH), de onde é possível inferir valores de seletividade e raios para consultas por similaridade. Uma comparação experimental permitiu mostrar os ganhos das estimativas da sinopse CDH com relação à diversos competidores. Também foi proposto um modelo de custo baseado na sinopse CDH, o modelo Stockpile, cujas estimativas se mostraram mais precisas na comparação com outros modelos. Os Histogramas-Omni são apresentados como a terceira contribuição desta tese. Estas estruturas de indexação, construídas a partir de restrições de particionamento de histogramas, permitem a execução otimizada de consultas que mesclam comparações por similaridade, identidade e ordem. A quarta contribuição de nossa investigação se refere ao modelo RVRM, que é capaz de indicar quanto é possível empregar as estimativas das sinopses de distância para otimizar consultas por similaridade em conjuntos de dados de alta dimensionalidade. O modelo RVRM se mostrou capaz de identificar intervalos de dimensões para os quais essas consultas podem ser executadas eficientes. Finalmente, a última contribuição desta tese propõe a integração das sinopses e modelos revisados em um sistema com sintaxe de alto nível que pode ser acoplado em um otimizador de consultas. / Similarity searching is a foundational paradigm for many modern computer applications, such as clustering, classification and information retrieval. Within this context, the meaning of similarity is related to the distance between objects, which can be formally expressed by the Metric Spaces Theory. Many studies have focused on the inclusion of similarity search into Database Management Systems (DBMSs) for (i) enabling similarity comparisons to be combined with the DBMSs identity and order comparisons and (ii) providing scalability for very large databases. As a step further, we propose the extension of the DBMS Query Optimizer and, particularly, the extension of two modules of the Query Optimizer, namely Data Distribution Space and Cost Model modules. Although the Data Distribution Space enables representations of stored data, such representations are unsuitable for modeling the behavior of similarity comparisons, which requires the extension of the module to support distance distributions. Likewise, the Cost Model module must be extended to support cost models that depend on distance distributions. Our study is based on five contributions. A new synopsis for distance distributions, called Compact-Distance Histogram (CDH), is proposed and enables radius and selectivity estimation for similarity searching. An experimental comparison showed the gains of the estimates drawn from CDH in comparison to several competitors. A cost model based on the CDH synopsis and with accurate estimates, called Stockpile, is also proposed. Omni-Histograms are presented as the third contribution of the thesis. Such indexing structures are constructed according to histogram partition constraints and enable the optimization of queries that combine similarity, identity and order comparisons. The fourth contribution refers to the model RVRM, which indicates the possible use of the estimates obtained from distance-based synopses for the query optimization of high-dimensional datasets and identifies intervals of dimensions where similarity searching can be efficiently executed. Finally, the thesis proposes the integration of the reviewed synopses and cost models into a single system with a high-level language that can be coupled to a DBMS Query Optimizer. Concentração de distâncias Consultas por similaridade Distance concentration Otimização de consultas Query optimization Similarity searching
97	Construção de redes usando estatística clássica e Bayesiana - uma comparação / Building complex networks through classical and Bayesian statistics - a comparison Thomas, Lina Dornelas 13 March 2012 (has links) Nesta pesquisa, estudamos e comparamos duas maneiras de se construir redes. O principal objetivo do nosso estudo é encontrar uma forma efetiva de se construir redes, especialmente quando temos menos observações do que variáveis. A construção das redes é realizada através da estimação do coeficiente de correlação parcial com base na estatística clássica (inverse method) e na Bayesiana (priori conjugada Normal - Wishart invertida). No presente trabalho, para resolver o problema de se ter menos observações do que variáveis, propomos uma nova metodologia, a qual chamamos correlação parcial local, que consiste em selecionar, para cada par de variáveis, as demais variáveis que apresentam maior coeficiente de correlação com o par. Aplicamos essas metodologias em dados simulados e as comparamos traçando curvas ROC. O resultado mais atrativo foi que, mesmo com custo computacional alto, usar inferência Bayesiana é melhor quando temos menos observações do que variáveis. Em outros casos, ambas abordagens apresentam resultados satisfatórios. / This research is about studying and comparing two different ways of building complex networks. The main goal of our study is to find an effective way to build networks, particularly when we have fewer observations than variables. We construct networks estimating the partial correlation coefficient on Classic Statistics (Inverse Method) and on Bayesian Statistics (Normal - Invese Wishart conjugate prior). In this current work, in order to solve the problem of having less observations than variables, we propose a new methodology called local partial correlation, which consists of selecting, for each pair of variables, the other variables most correlated to the pair. We applied these methods on simulated data and compared them through ROC curves. The most atractive result is that, even though it has high computational costs, to use Bayesian inference is better when we have less observations than variables. In other cases, both approaches present satisfactory results. Bayesian statistics complex networks correlação parcial estatística Bayesiana inverse method método inverso partial correlation redes
98	APLICATIONS OF PROBABILITY AND STATISTICS IN GEOTECHNICAL ANALYSES / APLICAÇÕES DE PROBABILIDADE E ESTATÍSTICA EM ANÁLISES GEOTÉCNICAS ROMULO CASTELLO HENRIQUES RIBEIRO 12 June 2008 (has links) CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO / Em análises geotécnicas, previsões de deformações ou de fatores de segurança são desenvolvidas com base em métodos determinísticos, que admitem como fixos e conhecidos os parâmetros do solo ou da rocha. Entretanto, tais previsões são afetadas por incertezas provenientes da impossibilidade de reprodução das condições de campo em laboratório, da perturbação do solo devida à instalação de instrumentos, das ocorrências geomecânicas não detectadas durante a campanha de sondagens, da variabilidade inerente ao maciço, entre outras. O estudo da influência dessas incertezas sobre os cálculos determinísticos, com a possibilidade da quantificação do risco de insucesso associado a um projeto geotécnico, desenvolveu-se durante as últimas décadas com base nas teorias de probabilidade e estatística. O presente trabalho realiza uma revisão bibliográfica de conceitos básicos de probabilidade e estatística, mostrando alguns avanços da aplicação desses conceitos na engenharia geotécnica. Visando apresentar formas de estimarem-se probabilidades de recalque inadmissível ou de ruptura são realizadas análises para os seguintes casos: recalques de argila mole solicitada por aterro e de fundações superficiais em areia, estabilidade de fundação superficial em solo residual e de fundação profunda em solo sedimentar, deslizamento de um muro de arrimo e estabilidade de um talude. Com o objetivo de inferir acerca dos fatores que influenciam as estimativas probabilísticas, para cada caso são realizadas comparações entre resultados obtidos com base em diferentes métodos probabilísticos e/ou determinísticos. / In geotechnical analyses, forecasts of safety factors or deformations are developed on the basis of deterministics methods, that admit as fixed and known the parameters of the soil or the rock. However, such forecasts are affected by uncertainties proceeding from the reproduction impossibility of the field conditions in laboratory, of the disturbance of the soil under installation of instruments, of the not detected geomechanics occurrences during the soundings campaign, of the inherent variability to the soil, among others. The study of the influence of these uncertainties on the deterministics calculations, with the possibility of the risk quantification of failure associated with a getechnical project, developed during the last decades on the basis in theories of probability and statistics. The present work make a bibliographical revision of basic concepts of probability and statistics, showing some advances of the application of these concepts in geotechnical engineering. With the objective to show forms of computing probabilities of rupture or of inadmissible settlement are make analyses for the following cases: settlement of fill on soft clay, settlement of superficial foundations in sand, stability of superficial foundation in residual soil, stability of deep foundation in sand, stability of retaining wall and dam slope stability. With the objective to verify the factors that influence the probabilist estimates, for each case is make comparisons between results given of different probabilist and/or deterministics methods.
99	Estatística em confiabilidade de sistemas: uma abordagem Bayesiana paramétrica / Statistics on systems reliability: a parametric Bayesian approach Rodrigues, Agatha Sacramento 17 August 2018 (has links) A confiabilidade de um sistema de componentes depende da confiabilidade de cada componente. Assim, a estimação da função de confiabilidade de cada componente do sistema é de interesse. No entanto, esta não é uma tarefa fácil, pois quando o sistema falha, o tempo de falha de um dado componente pode não ser observado, isto é, um problema de dados censurados. Neste trabalho, propomos modelos Bayesianos paramétricos para estimação das funções de confiabilidade de componentes e sistemas em quatro diferentes cenários. Inicialmente, um modelo Weibull é proposto para estimar a distribuição do tempo de vida de um componente de interesse envolvido em sistemas coerentes não reparáveis, quando estão disponíveis o tempo de falha do sistema e o estado do componente no momento da falha do sistema. Não é imposta a suposição de que os tempos de vida dos componentes sejam identicamente distribuídos, mas a suposição de independência entre os tempos até a falha dos componentes é necessária, conforme teorema anunciado e devidamente demonstrado. Em situações com causa de falha mascarada, os estados dos componentes no momento da falha do sistema não são observados e, neste cenário, um modelo Weibull com variáveis latentes no processo de estimação é proposto. Os dois modelos anteriormente descritos propõem estimar marginalmente as funções de confiabilidade dos componentes quando não são disponíveis ou necessárias as informações dos demais componentes e, por consequência, a suposição de independência entre os tempos de vida dos componentes é necessária. Com o intuito de não impor esta suposição, o modelo Weibull multivariado de Hougaard é proposto para a estimação das funções de confiabilidade de componentes envolvidos em sistemas coerentes não reparáveis. Por fim, um modelo Weibull para a estimação da função de confiabilidade de componentes de um sistema em série reparável com causa de falha mascarada é proposto. Para cada cenário considerado, diferentes estudos de simulação são realizados para avaliar os modelos propostos, sempre comparando com a melhor solução encontrada na literatura até então, em que, em geral, os modelos propostos apresentam melhores resultados. Com o intuito de demonstrar a aplicabilidade dos modelos, análises de dados são realizadas com problemas reais não só da área de confiabilidade, mas também da área social. / The reliability of a system of components depends on reliability of each component. Thus, the initial statistical work should be the estimation of the reliability of each component of the system. This is not an easy task because when the system fails, the failure time of a given component can be not observed, that is, a problem of censored data. We propose parametric Bayesian models for reliability functions estimation of systems and components involved in four scenarios. First, a Weibull model is proposed to estimate component failure time distribution from non-repairable coherent systems when there are available the system failure time and the component status at the system failure moment. Furthermore, identically distributed failure times are not a required restriction. An important result is proved: without the assumption that components\' lifetimes are mutually independent, a given set of sub-reliability functions does not identify the corresponding marginal reliability function. In masked cause of failure situations, it is not possible to identify the statuses of the components at the moment of system failure and, in this second scenario, we propose a Bayesian Weibull model by means of latent variables in the estimation process. The two models described above propose to estimate marginally the reliability functions of the components when the information of the other components is not available or necessary and, consequently, the assumption of independence among the components\' failure times is necessary. In order to not impose this assumption, the Hougaard multivariate Weibull model is proposed for the estimation of the components\' reliability functions involved in non-repairable coherent systems. Finally, a Weibull model for the estimation of the reliability functions of components of a repairable series system with masked cause of failure is proposed. For each scenario, different simulation studies are carried out to evaluate the proposed models, always comparing then with the best solution found in the literature until then. In general, the proposed models present better results. In order to demonstrate the applicability of the models, data analysis are performed with real problems not only from the reliability area, but also from social area. Análise de confiabilidade Bayesian paradigm Bridge system Causa de falha mascarada Coherent system Component reliability functions Dados mascarados Distribuição preditiva Estimação paramétrica FBST FBST Hougaard model k-out-of-m system Maked cause of failure Masked data Medidas de importância dos componentes Modelo de Hougaard Modelo Weibull Non-repairable system Paradigma Bayesiano Parallel system Parametric estimation Predictive distribution Reliability analysis Repairable system Series system Sistema coerente Sistema de ponte Sistema em paralelo Sistema em série Sistema k-de-m Sistema não reparável Sistema reparável Weibull model
100	Métodos estatísticos na análise de experimentos de microarray / Method Statistics in Microarrays Experiment Analisis Cristo, Elier Broche 30 October 2003 (has links) Neste trabalho é proposto um estudo comparativo de alguns métodos de Agrupamento (Hierárquico, K-médias e Self-Organizing Maps) e de Classificação (K-Vizinhos, Fisher, Máxima Verossimilhança, Aggregating e Regressão Local), os quais são apresentados teoricamente. Tais métodos são testados e comparados em conjuntos de dados reais, gerados com a técnica de Microarray. Esta técnica permite mensurar os níveis de expressão de milhares de genes simultaneamente, possibilitando comparações entre amostras de tecidos pelos perfis de expressão. É apresentada uma revisão de conceitos básicos relacionados ao processo de normalização, sendo este uma das primeiras etapas da análise deste tipo de conjunto de dados. Em particular, estivemos interessados em encontrar pequenos grupos de genes que fossem ?suficientes? para distinguir amostras em condições¸ biológicas diferentes. Por fim, é proposto um método de busca que, dado os resultados de um experimento envolvendo um grande número de genes, encontra de uma forma eficiente os melhores classificadores. / In this work we propose a comparative study of some clustering methods (Hierarchic, K -Means and Self-Organizing Maps) and some classification methods (K-Neighbours, Fisher, Maximum Likelihood, Aggregating and Local Regression), which are presented teoretically. The methods are tested and compared based on the analysis of some real data sets, generated from Microarray experiments. This technique allows for the measurement of expression levels from thousands of genes simultaneously, thus allowing the comparative analysis of sample of tissues in relation to their expression profile. We present a review of basic concepts regarding normalization of microarray data, one of the first steps in microarray analysis. In particular, we were interested in finding small groups of genes that were ?sufficient? to identify samples originating from different biological conditions. Finally, a search method is proposed, which will find efficiently the best classifiers from the results of an experiment involving a huge number of genes. Análise estatística Analisis statistics microarrays Microarrays

Search results