Global ETD Search

51	Análise comparativa de modelos de estatística multivariada aplicados à previsão de níveis de poluentes atmosféricos. / Comparative analysis of multivariate statistical models applied to the prediction of air pollutant levels. Paula, Renata Ramos Rodrigues de 07 February 2017 (has links) O presente estudo visa à análise comparativa do desempenho dos modelos de estatística multivariada Multi-layer Perceptron Neural Networks, Random Forests e Support Vector Machine na previsão de máxima concentração diária de ozônio na baixa atmosfera na Região Metropolitana de São Paulo (RMSP), caracterizada pela alta concentração de habitantes e intensa atividade econômica, onde a qualidade do ar é afetada principalmente por episódios de altos níveis de ozônio. Foram aplicados tanto modelos de regressão quanto de classificação. Nos casos de classificação, estudou-se também o desempenho de dois modelos de análise de discriminantes: Linear Discriminant Analysis e Fisher Discriminant Analysis. Para a construção dos modelos utilizou-se uma base de dados com medições de variáveis meteorológicas, além da concentração de ozônio, fornecida pela Companhia Ambiental do Estado de São Paulo (CETESB). Dada a grande importância e a complexidade do processo de formação de ozônio na baixa atmosfera, a Universidade de São Paulo (USP) e a CETESB têm desenvolvido estudos no tema desde 1999, através dos quais produziram-se modelos de previsão baseados em redes neurais, implementados pela equipe da CETESB. O presente estudo é uma continuação do desenvolvimento anterior e contém as seguintes inovações quanto à metodologia e resultados esperados: (1) ajuste de novos modelos com novas estruturas, incluindo-se técnicas de Support Vector Machine, Random Forests e Discriminação; (2) uso de uma base de dados mais ampla e atualizada, de modo a melhorar a representatividade dos modelos; (3) ajuste dos modelos à nova legislação, Decreto Estadual 59.113 de 23/04/2013, que estabelece novos padrões de qualidade do ar para os poluentes atmosféricos, dentre os quais o ozônio. Embora nos casos de classificação nenhum dos modelos tenha apresentado bons resultados, nos casos de regressão foi possível obter resultados melhores do que os esperados. O modelo de Multi-layer Perceptron foi o que mostrou melhor desempenho para prever concentrações máximas de ozônio, tanto para a previsão de máximas concentrações baseadas em médias horárias quanto em médias móveis de 8 horas, que resultaram em coeficientes de correlação 0,867 e 0,891, respectivamente. / The present study aims to compare the performance of the multivariate statistical models Multi-layer Perceptron Neural Networks, Random Forests and Support Vector Machine applied to the prediction of daily maximum concentrations of groundlevel ozone in the Metropolitan Area of São Paulo (MASP), characterized by the high population density and the intense economic activity, where the air quality is mostly affected by high ozone levels. Both regression and classification models were applied. In the classification cases, two more models were applied: Linear Discriminant Analysis and Fisher Discriminant Analysis. The models were constructed using a database containing meteorological variables and daily maximum ozone concentration values, which were provided by the Environmental Agency of São Paulo State (CETESB). Given the great importance and complexity of the process of ozone formation in the troposphere, the University of São Paulo (USP) and CETESB have made studies in this area since 1999 and developed a prediction model based on neural networks, which was implemented by CETESB. The present study is a continuation of the previous one and contains the following innovations regarding the methodology and expected results: (1) comparison with other models such as support vector machines, random forests and discriminant analysis; (2) use of a wider and up-to-date database, which improves the representativeness of the models; (3) the models took into acount the new legislation, State decree 59113 of 04/23/2013, that establishes new air quality standards for ozone. Although none of the classification models had a good performance, the regression models yielded better than expected results. The multi-layer perceptron model was the one with higher performance in the prediction of daily maximum ozone concentrations based both on hourly averages and on eight-hour moving averages, which yielded correlation coefficients of 0.867 and 0.891 respectively. Aprendizado computacional Environmental Machine learning Multivariate statistic Ozone Ozônio Pollution Poluição Qualidade do ar
52	A importância da estatística no ensino médio Zen, Priscila Dombrovski 18 December 2017 (has links) Submitted by Angela Maria de Oliveira (amolivei@uepg.br) on 2018-02-21T11:47:30Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Priscila Dombrovski Zen.pdf: 2682427 bytes, checksum: c366809efcc999813365ec269162f392 (MD5) / Made available in DSpace on 2018-02-21T11:47:30Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Priscila Dombrovski Zen.pdf: 2682427 bytes, checksum: c366809efcc999813365ec269162f392 (MD5) Previous issue date: 2017-12-18 / A presente pesquisa teve como objetivo principal a análise de documentos, livros didáticos e questões do ENEM a fim de verificar como os mesmos apontam os caminhos para trabalhar o conteúdo de Estatística dentro da disciplina de Matemática no Ensino Médio. A pesquisa será qualitativa, de cunho exploratório e se apoiará nos princípios da pesquisa-ação. Sustentando-se em tais princípios, optou-se pela revisão da literatura referente ao ensino de Estatística onde observou-se o potencial que a mesma apresenta perante o ensino da Matemática na Educação Básica. As legislações investigadas apontam que, cada uma no seu campo de interesse, tem a necessidade da formação completa, disponibilizando aos alunos artifícios capazes de torná-los cidadãos plenos nos aspectos físicos, cognitivos e socioemocionais. O próximo ponto analisado foi em relação a como, o conteúdo da disciplina em foco, é apresentado aos estudantes através dos livros didáticos. Para isso foi realizado um levantamento das coleções utilizadas pelos colégios e verificou-se o conjunto de obras mais utilizado na grande maioria das instituições. Também, segundo a análise realizada, o mesmo é o que contempla os pressupostos contidos na legislação direcionada, seguindo o sentido do proposto pelos autores contidos na literatura. Após as considerações levantadas, foram observadas as questões do ENEM, a fim de verificação referente à quantidade destas no uso da Estatística quando apresentada nas provas. Ainda, quais foram os devidos conteúdos abordados. Como produto final deste documento, foi elaborado um material didático de apoio ao professor, contendo duas propostas de ensino, que se ancoram com o suporte do Software Microsoft Office Excel. As mesmas foram contextualizadas sobre conteúdos básicos de Estatística voltados ao Ensino Médio. / The main objective of the present research was the analysis of documents, textbooks and ENEM questions in order to verify how they point out the ways to work the content of Statistics within Mathematics in High School. The research will be qualitative, exploratory and based on the principles of action research. Based on these principles, it was decided to review the literature regarding the teaching of Statistics, where it was observed the potential that it presents before the teaching of Mathematics in Basic Education. The legislations investigated point out that each one in their field of interest has the need for complete training, providing students with devices capable of making them full citizens in the physical, cognitive and social-emotional aspects. The next point analyzed was in relation to how the content of the subject in focus, is presented to the students through the textbooks. For this, a survey of the collections used by the colleges was carried out and the most used set of works was verified in the great majority of the institutions. Also, according to the analysis carried out, the same is what contemplates the assumptions contained in the targeted legislation, following the meaning proposed by the authors contained in the literature. After the considerations raised, the ENEM questions were observed, in order to verify the number of them in the use of Statistics when presented in the tests. Also, what were the appropriate content addressed. As a final product of this document, a didactic material to support the teacher was prepared, containing two teaching proposals that are anchored with the support of Microsoft Office Excel Software. These were contextualized on basic contents of Statistics directed to High School. Ensino Médio Matemática . Estatística. High School . Mathematics . Statistic
53	Memória institucional e gestão documental no Instituto de Matemática e Estatística da UFRGS Gutierrez, Ana Lérida Pacheco January 2017 (has links) Esta dissertação constitui uma investigação sobre as relações entre memória institucional e gestão documental no Instituto de Matemática e Estatística (IME) da Universidade Federal do Rio Grande do Sul (UFRGS). O referencial teórico amparou-se em Andrade (2002), Costa (1997), Gondar (2005) e Halbwachs (2006), entre outros. O percurso metodológico constou de um estudo de caso simples, que envolveu pesquisa bibliográfica, documental, registros fotográficos, observação e dezoito entrevistas semiestruturadas realizadas com servidores docentes e técnico-administrativos gestores e não gestores do IME, e com os gestores dos órgãos responsáveis pela gestão documental e difusão da memória institucional da UFRGS. Os dados sistematizados foram analisados segundo a análise de conteúdo temática. Os resultados da contextualização analítica do IME lançam luz sobre as memórias na perspectiva dos gestores docentes, a partir dos relatos dos pioneiros, desde o Instituto de Matemática, denominada de face antiga do IME (1959-1985), e depois da mudança de sede, até a chamada face contemporânea, (1985-2016) em que recentes mudanças culminaram na alteração do próprio nome, que passou a ser Instituto de Matemática e Estatística (IME). A partir das análises, os resultados identificam: 1) a presença de inter-relações macro e microssociais e institucionais, apresentando o IME em relação à UFRGS e às demandas sociais que o mobilizam; 2) a divisão estrutural dos órgãos responsáveis por políticas de gestão documental e de difusão das políticas de memória na UFRGS, aliado ao número limitado de profissionais arquivistas, podem estar limitando seus âmbitos de atuação em setores distintos e não completamente alinhados dentro da UFRGS e 3) a memória institucional se apresenta em uma forma bidimensional e indissociável, onde podem ser identificadas duas faces: a) a face das lembranças, a qual é associada ao plano da expressão oral, da comunicação das práticas e conhecimentos tácitos, da longa permanência e legitimidade frequentemente associada aos gestores docentes e b) a face do esquecimento associada aos seus registros documentais, cuja gestão embora predominantemente realizada por gestores técnico-administrativos, sujeita-se a flutuações e descontinuidades. / This dissertation constitutes in an investigation about the relationships between institutional memory and documental management at Statistic and Mathematics Institute (IME) of Federal University of Rio Grande do Sul (UFRGS). The theoric reference was based in Andrade (2002), Costa, (1997), Gondar (2005) and Hawbachs (2006), among other authors. The methodologic course consisted on a simple case study, based in bibliographic and documental research, photographic registers, observation and eighteen demi-structured interviews made with servers: Docents and Technical-Administratives; Managers or non-Managers of IME, and with Managers responsible for Organizations who take role on documental management and spreading of institutional memory of UFRGS. The data was analyzed in accord with thematic content analysis. The results of analytic contextualization of IME brings lights over the memories in a document manager´s perspective. Starting from pioneer´s reports, since the the Mathematics Institute was called “the old face of IME” (1959-1985), and after changing the headquarter´s address, to the called “contemporany face” (1985-2016), in which recent changes leaded to an modification in their own name, now called IME – Mathematics and Statistic Institute. Starting from the analysis, the results indicates: 1) the presence of institutional/social, macro/micro inter-relations presenting the IME related to UFRGS and the social demands who mobilized it. 2) the structural division of the organizations responsible for documental management politics and diffusion of memory politics at UFRGS, allied to an limited number of archivists, can be putting limits to their own activity in different sectors and not entirely aligned inside the University and 3) the institutional memory presentes itself in a two dimensional and non-dissociative form, where two faces can be identified: a)the reminds side, associated to oral expression plan, comunication of practices and tacit knowledge, of long permanency and legitimacy, frequently associated to docent managers, and b) the forgetting side associated to their documental recordings, even made by technical-administrative managers is submissive to fluctuations and discontinuities. Memória institucional Gestão de documentos Memória coletiva Memória social Institutional Memory Statistic and Mathematics Institute Documental Management
54	Modelagem matemática como ambiente de aprendizagem de estatística na Educação Básica Machado, Minéia Bortole January 2017 (has links) A presente pesquisa de cunho qualitativo consiste em um estudo de caso que visa experimentar a Modelagem Matemática como Ambiente de Aprendizagem na introdução de conteúdos programáticos de Estatística. A questão que norteou nossa pesquisa foi: “Um Ambiente de Modelagem Matemática favorece a aprendizagem de Estatística na Educação Básica?” Na busca de resposta a essa pergunta, as atividades foram pensadas baseadas no contexto no qual a turma está inserida. Elaboramos uma sequência didática baseada em questionamentos direcionados à reflexão e à investigação. Nesse cenário, o professor tem papel de incentivador da autonomia e capacidade dos alunos produzirem estratégias para resolverem problemas. Trata-se de um plano de natureza aberta, no qual os conhecimentos prévios dos alunos e suas dúvidas têm maior responsabilidade no processo de aprendizagem. Escolhemos a Modelagem Matemática como metodologia, pois ela atende aos objetivos de nosso trabalho, de dar significado à Matemática à medida que a aproximamos da realidade do estudante, desenvolver a autonomia dos alunos, estimulá-los à reflexão e a crítica de fatos oriundos da sociedade. Queremos que os conteúdos sejam introduzidos dentro de um contexto com referência ao dia a dia do educando. Nossa expectativa é que por meio da compreensão da Estatística e de seu papel na sociedade os alunos consigam utilizá-la como ferramenta de análise da realidade vivida. Essa sequência didática foi aplicada em uma turma de 7º ano de Ensino Fundamental de uma escola pública de Sapucaia do Sul – RS. Baseado nesse trabalho, julgamos que utilizar a Modelagem Matemática como Ambiente de Aprendizagem favorece a aprendizagem de Estatística. Acreditamos que os alunos tiveram maior envolvimento nas atividades à medida que a Matemática se tornava mais próxima à realidade deles. Ao longo do trabalho desenvolvido junto aos alunos, percebemos uma evolução na compreensão dos conteúdos abordados. Atribuímos essa evolução ao maior envolvimento dos alunos nos Ambientes de Aprendizagem proporcionados pela Modelagem Matemática. / This research consists in a case study which experiments Mathematical Modelling as a Learning Environment to introduce statistical contents. This work seeks to answer the following question: “Does a Mathematical Modelling Environment favors statistical learning on lower secondary education?” In order to answer that, activities were created based on questions that consider the context of the class. In this scenario, the teacher has the role of encouraging autonomy and the students the ability of to producing strategies to solve problems. It is an open plan in which the students' previous knowledge and their doubts have greater responsibility in the learning process. We chose Mathematical Modelling as methodology because it meets the objectives of our work, to give meaning to Mathematics as we approach the reality of the student, to develop students' autonomy, to stimulate them to reflect and critique facts from society. We want the contents to be introduced within a context with reference to the student's day-to-day life. Our expectation is that through the understanding of Statistics and its role in society, students will be able to use it as a tool for analyzing their reality. This didactical sequence was applied on a 7th grade elementary public school class of Sapucaia do Sul – RS. Based on this work, we believe that using Mathematical Modeling as a Learning Environment favors the learning of Statistics. We also believe that students were more involved in activities as Mathematics became closer to their reality. Throughout the work developed with the students, we perceived an evolution in the comprehension of the covered contents. We attribute this evolution to the greater involvement of students in the Learning Environments provided by Mathematical Modeling. Modelagem matemática Ambiente de aprendizagem Learning Environment Statistical Research Statistic Mathematical Modelling
55	Um estudo sobre estimação e predição em modelos geoestatísticos bivariados / A study on estimation and prediction in bivariate geostatistical models Bruno Henrique Fernandes Fonseca 05 March 2009 (has links) Os modelos geoestatísticos bivariados denem funções aleatórias para dois processos estocásticos com localizações espaciais conhecidas. Pode-se adotar a suposição da existência de um campo aleatório gaussiano latente para cada variável aleatória. A suposição de gaussianidade do processo latente é conveniente para inferências sobre parâmetros do modelo e para obtenção de predições espaciais, uma vez que a distribuição de probabilidade conjunta para um conjunto de pontos do processo latente é também gaussiana. A matriz de covariância dessa distribuição deve ser positiva denida e possuir a estrutura de variabilidade espacial entre e dentre os atributos. Gelfand et al. (2004) e Diggle e Ribeiro Jr. (2007) propuseram estratégias para estruturar essa matriz, porém não existem muitos relatos sobre o uso e avaliações comparativas entre essas abordagens. Neste trabalho foi conduzido um estudo de simulação de modelos geoestatísticos bivariados em conjunto com estimação por máxima verossimilhança e krigagem ordinária, sob diferentes congurações amostrais de localizações espaciais. Também foram utilizados dados provenientes da análise de solo de uma propriedade agrícola com 51,8ha de área, onde foram amostradas 67 localizações georeferenciadas. Foram utilizados os valores mensurados de pH e da saturação por bases do solo, que foram submetidas à análise descritiva espacial, modelagens geoestatísticas univariadas, bivariadas e predições espaciais. Para vericar vantagens quanto à adoção de modelos univariados ou bivariados, a amostra da saturação por bases, que possui coleta mais dispendiosa, foi dividida em uma subamostra de modelagem e uma subamostra de controle. A primeira foi utilizada para fazer a modelagem geoestatística e a segunda foi utilizada para comparar as precisões das predições espaciais nas localizações omitidas no processo de modelagem. / Bivariate geostatistical models dene random functions for two stochastic processes with known spatial locations. Existence of a Gaussian random elds can be assumed for each latent random variable. This Gaussianity assumption for the latent process is a convenient one for the inferences on the model parameters and for spatial predictions once the joint distribution for a set of points is multivariate normal. The covariance matrix of this distribution should be positivede nite and to have the spatial variability structure between and among the attributes. Gelfand et al. (2004) and Diggle e Ribeiro Jr. (2007) suggested strategies for structuring this matrix, however there are few reports on comparing approaches. This work reports on a simulation study of bivariate models together with maximum likelihood estimators and spatial prediction under dierent sets of sampling locations space. Soil sample data from a eld with 51.8 hectares is also analyzed with the two soil attributes observed at 67 spatial locations. Data on pH and base saturation were submitted to spatial descriptive analysis, univariate and bivariate modeling and spatial prediction. To check for advantages of the adoption of univariate or bivariate models, the sample of the more expensive variable was divided into a modeling and testing subsamples. The rst was used to t geostatistical models, and the second was used to compare the spatial prediction precisions in the locations not used in the modeling process. Campos aleatórios Geoestatística Processos gaussianos Simulação (Estatística) Verossimilhança. Gaussian processes Geostatistic Likelihood. Random Fields Simulation (Statistic)
56	An Empirical Investigation of Marascuilo's Ú₀ Test with Unequal Sample Sizes and Small Samples Milligan, Kenneth W. 08 1900 (has links) The study seeks to determine the effect upon the Marascuilo Ú₀ statistic of violating the small sample assumption. The study employed a Monte Carlo simulation technique to vary the degree of sample size and unequal sample sizes within experiments to determine the effect of such conditions, Twenty-two simulations, with 1200 trials each, were used. The following conclusion appeared to be appropriate: The Marascuilo Ú₀ statistic should not be used with small sample sizes and it is recommended that the statistic be used only if sample sizes are larger than ten. statistic methods Statistical hypothesis testing. Nonparametric statistics.
57	Estudo estatístico do desempenho analítico das técnicas eletroquímicas VPD, VOQ e MVOQ para a redução do herbicida paraquat em UME-Au e sua quantificação em águas puras e naturais / Statistic study of the analytical performance of DPV, SWV and MSWV electrochemical techniques for the reduction of the paraquat herbicide on an Au-UME and its analysis and quantification in pure and natural waters Silva, Osmair Benedito da 30 January 2008 (has links) Neste trabalho foi realizado um estudo estatístico do desempenho analítico das técnicas eletroquímicas Voltametria de pulso diferencial (VPD), Voltametria de onda quadrada (VOQ) e Voltametria de múltiplas ondas quadradas (VMOQ) para a redução do herbicida paraquat em ultramicroeletrodo de ouro (UME-Au) e sua análise e quantificação em águas puras e naturais. Os estudos da redução eletroquímica do paraquat foram realizados em eletrólito de suporte Na2SO4 0,1 mol L-1 em pH = 5,5. Os resultados mostraram a presença de dois picos de redução bem definidos , com o pico 1 em -0,64 V e o pico 2 em -0,94 V vs Ag/AgCl 3,0 mol L-1. Para as análises foi utilizado o pico 1 que é referente a uma reação em solução, onde a superfície eletródica serve apenas como intermediadora na transferência eletrônica. Utilizando-se os parâmetros experimentais e voltamétricos otimizados, curvas de trabalho foram construídas em eletrólito de suporte preparado com água ultrapurificada. Para a VPD, VOQ e VMOQ, os limites de detecção encontrados utilizando-se o método descrito em Miller & Miller e considerando-se o pico 1 foram: 55,35 ± 0,18 µg L-1 , 37,50 ± 0,32 µg L-1 e 21,42 ± 0,51 µg L-1, respectivamente. A metodologia foi aplicada em amostras de águas coletadas no Rio Mogi-Guaçu na cidade de São Carlos-SP. A variação da inclinação das curvas de trabalho obtida nas análises destas amostras foi praticamente insignificante em relação àquela obtida utilizando-se o eletrólito de suporte preparado com água ultrapurificada para as três técnicas utilizadas, mostrando pouca interferência da matéria orgânica presente nas amostras de águas naturais. Os menores valores de concentração mensuráveis de paraquat nas amostras de águas naturais foram inferiores ao valor máximo de resíduos permitido pela legislação brasileira para águas residuárias (100 µg L-1), mostrando a viabilidade na aplicação da metodologia proposta. Os cálculos dos limites de detecção e quantificação, a avaliação dos erros experimentais e os limites de confiança foram realizados a partir do procedimento estatístico descrito em Miller & Miller. Os valores de limites de detecção encontrados apresentam um significado físico e realístico para esta grandeza, já que foram determinados por interpolação nas curvas analíticas, construídas a partir da relação entre as concentrações do pesticida e os sinais obtidos. Estes mesmos limites foram calculados utilizando-se o método recomendado pela IUPAC e os valores encontrados foram 29,43 µg L-1 para VPD, 3,63 µg L-1 para VOQ e 0,37 µg L-1 para VMOQ. Estes resultados mostram diferenças de até 2 ordens de grandeza de uma técnica para outra, que se não forem bem avaliados podem levar o analista a conclusões equivocadas. / Statisctic studies of the analytical performance of Differential pulse voltammetry (DPV), Square wave voltammetry (SWV) and Multiple square wave voltammetry (MSWV) electrochemical techniques was performed for the electrochemical reduction of the paraquat herbicide on an gold ultramicroelectrode (Au-UME), aiming its analysis and quantification in pure and natural waters. The electrochemical reduction of paraquat was studied in 0.1 mol L-1 Na2SO4 support electrolyte in pH = 5.5 . The electrochemical responses showed two well-defined reduction peaks with potentials of -0.64 and -0.94 V vs Ag/AgCl 3.0 mol L-1 for peaks 1 and 2, respectively. The mathematical treatment was performed with data from peak 1, which is associated with a reversible solution reaction, where the electrode surface acts only as the electron transfer mediator. Working curves were obtained using the optimized experimental parameters in support electrolyte prepared with high purity water. To DPV, SWV and MSWV, the detection limits found for the peak currents associated to peak 1 were: 55.35 ± 0.18 µg L-1, 37.50 ± 0.32 µg L-1 and 21.42 ± 0.51 µg L-1, respectively. An analogous analytical procedure was applied in water samples collected in Mogi-Guaçu River, in São Carlos County, state of São Paulo. The variation in slope values, between curves obtained from pure water and those from river waters were practically depreciable for the three analytical methods, showing that the organic components and others contaminants present in natural waters caused only a minor interference in the measurements. The lower measurable values for paraquat in water samples were smaller than the allowed maximum of residue established by the Brazilian legislation for waste waters (100 µg L-1), indicating the developed methodology as convenient to such application. Calculations of detection and quantification limits, the evaluation of experimental errors and the limits of confidence were performed by the statistical procedure reported by Miller & Miller. The obtained detection limits values present a true physical significance, in a realistic order of magnitude, as they were determined by interpolation of the experimental analytical curves. The same analytical curves were submitted to the IUPAC methodology yielding the values of 29.43, 3.63 and 0.37 µg L-1 para DPV, SWV and MSWV, respectively, for pure water measurements. These results showed a difference up to two orders of magnitude from those obtained by the statistical methodology and may promote false conclusions, if not properly evaluated. águas naturais electrochemical techniques estatística natural waters paraquat paraquat statistic técnicas eletroquímicas
58	Détection de ruptures et mouvement Brownien multifractionnaire / Change Point Detection and multifractional Brownian motion Fhima, Mehdi 13 December 2011 (has links) Dans cette thèse, nous développons une nouvelle méthode de détection de ruptures "Off-line", appelée Dérivée Filtrée avec p-value, sur des paramètres d'une suite de variables aléatoires indépendantes, puis sur le paramètre de Hurst d'un mouvement Brownien multifractionnaire. Cette thèse est composée de trois articles. Dans un premier article paru dans Sequential Analysis nous posons les bases de la méthode Dérivée Filtrée avec p-value (FDpV) en l'appliquant à une suite de variables aléatoires indépendantes. La méthode a une complexité linéaire en temps et en mémoire. Elle est constituée de deux étapes. La première étape utilisant la méthode Dérivée Filtrée détecte les bons instants de ruptures, mais également certaines fausses alarmes. La deuxième étape attribue une p-value à chaque instant de rupture potentiel détecté à la première étape, et élimine les instants dont la p-value est inférieure à un certain seuil critique. Nous démontrons les propriétés asymptotiques nécessaires à la calibration de la méthode. L'efficacité de la méthode a été prouvé tant sur des données simulées que sur des données réelles. Ensuite, nous nous sommes attaqués à l'application de la méthode pour la détection de ruptures sur le paramètre de Hurst d'un mouvement Brownien multifractionnaire. Cela s'est fait en deux phases. La première phase a fait l'objet d'un article à paraitre dans ESAIM P&S où nous avons établi un Théorème Central Limite pour l'estimateur du paramètre de Hurst appelé Increment Ratio Statistic (IRS). Puis, nous avons proposé une version localisée de l'IRS et démontré un TCL local pour estimer la fonction de Hurst d'un mouvement Brownien multifractionnaire. Les preuves sont intuitives et se distinguent par leur simplicité. Elles s'appuient sur le théorème de Breuer-Major et une stratégie originale appelée "freezing of time". La deuxième phase repose sur un nouvel article soumis pour publication. Nous adaptons la méthode FDpV pour détecter des ruptures sur l'indice de Hurst d'un mouvement Brownien fractionnaire constant par morceaux. La statistique sous-jacent de l'algorithme FDpV est un nouvel estimateur de l'indice de Hurst, appelé Increment Zero-Crossing Statistic (IZCS) qui est une variante de l'IRS. La combinaison des méthodes FDpV + IZCS constitue une procédure efficace et rapide avec une complexité linéaire en temps et en mémoire. / This Ph.D dissertation deals with "Off-line" detection of change points on parameters of time series of independent random variables, and in the Hurst parameter of multifrcational Brownian motion. It consists of three articles. In the first paper, published in Sequential Analysis, we set the cornerstones of the Filtered Derivative with p-Value method for the detection of change point on parameters of independent random variables. This method has linear time and memory complexities, with respect to the size of the series. It consists of two steps. The first step is based on Filtered Derivative method which detects the right change points as well as the false ones. We improve the Filtered Derivative method by adding a second step in which we compute the p-values associated to every single potential change point. Then we eliminate false alarms, i.e. the change points which have p-value smaller than a given critical level. We showed asymptotic properties needed for the calibration of the algorithm. The effectiveness of the method has been proved both on simulated data and on real data. Then we moved to the application of the method for the detection of change point on the Hurst parameter of multifractional Brownian motion. This was done in two phases. In the first phase, a paper is to be published in ESAIM P&S where we investigated the Central Limit Theorem of the Increment Ratio Statistic of a multifractional Brownian motion, leading to a CLT for the time varying Hurst index. The proofs are quite simple relying on Breuer-Major theorems and an original freezing of time strategy.The second phase relies on a new paper submitted for publication. We adapted the FDpV method to detect change points on the Hurst parameter of piecewise fractional Brownian motion. The underlying statistics of the FDpV technology is a new statistic estimator for Hurst index, so-called Increment Zero-Crossing Statistic (IZCS) which is a variation of IRS. Both FDpV and IZCS are methods with linear time and memory complexities, with respect to the size of the series. Dérivée Filtrée avec p-value Mouvement Brownien multifractionnaire Paramètre de Hurst Increment Ratio Statistic Increment Zero-Crossing Statistic Filtered Derivative with p-Value Multifractional Brownian motion Hurst parameter Increment Ratio Statistic Increment Zero-Crossing Statistic
59	Differential item functioning procedures for polytomous items when examinee sample sizes are small Wood, Scott William 01 May 2011 (has links) As part of test score validity, differential item functioning (DIF) is a quantitative characteristic used to evaluate potential item bias. In applications where a small number of examinees take a test, statistical power of DIF detection methods may be affected. Researchers have proposed modifications to DIF detection methods to account for small focal group examinee sizes for the case when items are dichotomously scored. These methods, however, have not been applied to polytomously scored items. Simulated polytomous item response strings were used to study the Type I error rates and statistical power of three popular DIF detection methods (Mantel test/Cox's β, Liu-Agresti statistic, HW3) and three modifications proposed for contingency tables (empirical Bayesian, randomization, log-linear smoothing). The simulation considered two small sample size conditions, the case with 40 reference group and 40 focal group examinees and the case with 400 reference group and 40 focal group examinees. In order to compare statistical power rates, it was necessary to calculate the Type I error rates for the DIF detection methods and their modifications. Under most simulation conditions, the unmodified, randomization-based, and log-linear smoothing-based Mantel and Liu-Agresti tests yielded Type I error rates around 5%. The HW3 statistic was found to yield higher Type I error rates than expected for the 40 reference group examinees case, rendering power calculations for these cases meaningless. Results from the simulation suggested that the unmodified Mantel and Liu-Agresti tests yielded the highest statistical power rates for the pervasive-constant and pervasive-convergent patterns of DIF, as compared to other DIF method alternatives. Power rates improved by several percentage points if log-linear smoothing methods were applied to the contingency tables prior to using the Mantel or Liu-Agresti tests. Power rates did not improve if Bayesian methods or randomization tests were applied to the contingency tables prior to using the Mantel or Liu-Agresti tests. ANOVA tests showed that statistical power was higher when 400 reference examinees were used versus 40 reference examinees, when impact was present among examinees versus when impact was not present, and when the studied item was excluded from the anchor test versus when the studied item was included in the anchor test. Statistical power rates were generally too low to merit practical use of these methods in isolation, at least under the conditions of this study. Bayesian Differential Item Functioning Liu-Agresti Statistic Log-Linear Smoothing Polytomous Items Sample Size Educational Psychology
60	Symbolic Regression of Thermo-Physical Model Using Genetic Programming Zhang, Ying 06 April 2004 (has links) The symbolic regression problem is to find a function, in symbolic form, that fits a given data set. Symbolic regression provides a means for function identification. This research describes an adaptive hybrid system for symbolic function identification of thermo-physical model that combines the genetic programming and a modified Marquardt nonlinear regression algorithm. Genetic Programming (GP) system can extract knowledge from the data in the form of symbolic expressions, i.e. a tree structure, that are used to model and derive equation of state, mixing rules and phase behavior from the experimental data (properties estimation). During the automatic evolution process of GP, the function structure of generated individual module could be highly complicated. To ensure the convergence of the regression, a modified Marquardt regression algorithm is used. Two stop criteria are attached to the traditional Marquardt algorithm to enforce the algorithm repeat the regression process before it stops. Statistic analysis is applied to the fitted model. Residual plot is used to test the goodness of fit. The χ2-test is used to test the model's adequacy. Ten experiments are run with different form of input variables, number of data points, standard errors added to data set, and fitness functions. The results show that the system is able to find models and optimize for its parameters successfully. machine learning function identification statistic analysis modeling nonlinear regression American Studies Arts and Humanities

Search results