1 |
Verificação dos efeitos das variâncias e das relações de variáveis ligadas à pecuária de leite no agrupamento dos produtores / Verification of the effects of variances and of the relationships among variables related to milk production in the grouping of dairy farmersCampana, Ana Carolina Mota 16 February 2009 (has links)
Made available in DSpace on 2015-03-26T13:32:06Z (GMT). No. of bitstreams: 1
texto completo.pdf: 358534 bytes, checksum: 24e75168f2f6257c7ffe917ef5ade7c8 (MD5)
Previous issue date: 2009-02-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nowadays research often collect information on many variables from a great number of experimental units, hence produce and store large amount of data, which in turn requires methods that can handle such situations. Statistical methods such as the principal component analysis (PCA), that can reduce the dimensionality of the analysis without significant information loss, are of great interest. PCA can use either the covariance (S) or the correlation (R) matrix among variables, but the analysis may result in different Principal Components (PC) resulting from R or S. In order to indicate the best strategies for different scenarios, we conducted a simulation study to investigate the effects of variable scaling over the viability and quality of the results from PCA analysis used to cluster experimental units. In addition to this first simulation study, we also conducted a second one using animal science and economical variables from 255 dairy producers from three locations of Minas Gerais State. The goal was to verify the most appropriate data structure for cluster analysis, such that it best classifies the most economically viable producers. In both studies we used a transformation of variables based on its coefficient of variation, which resulted in a new covariance matrix named S*. Results showed that the use of matrix S favored economical variables with larger variances, while use of R matrix resulted as the most important variables the ones with larger correlations among them. Calculations of PC using matrix S* minimized these scaling problems when S and R matrices are used. Analysis using S is entirely affected by the variable scale while using R is not affected by the scale at all. We concluded that the S* matrix was the most appropriate for the present case study because it considered the most important economical variables to be the ones most related to the animal science variables. / Com o aumento substancial na quantidade de dados armazenados, surge a necessidade da utilização de métodos que permitam analisar simultaneamente várias variáveis medidas em cada elemento amostral, e ainda com a possibilidade de reduzir a dimensionalidade desse conjunto sem perda significativa de informação. Entre eles, pode-se citar o método dos componentes principais, cuja obtenção pode envolver a matriz de covariâncias (S) ou a de correlações (R) das variáveis de interesse. Como a utilização dessas matrizes pode fornecer diferentes componentes, objetivou-se investigar, por meio da simulação de dados, os efeitos das escalas das características sobre a qualidade e a viabilidade da classificação dos elementos amostrais, buscando assim, indicar estratégias de análise mais adequadas em diferentes casos. Além do estudo de simulação, foi realizado outro com variáveis zootécnicas e econômicas referentes a 255 produtores de leite de três regiões do estado de Minas Gerais, com o objetivo de verificar qual a melhor estrutura de dados em classificar de forma mais apropriada os produtores mais viáveis economicamente. Em ambos os estudos, foi efetuada uma transformação nos valores das variáveis baseada nos respectivos coeficientes de variação, cuja matriz de covariâncias foi denominada de S*. Observou-se que a utilização da matriz S privilegiou as variáveis econômicas de maiores variâncias, enquanto a matriz R considerou as variáveis mais correlacionadas entre si como as mais importantes. A obtenção dos CPs com base na matriz S* minimizou os problemas das escalas inerentes aos usos das matrizes S e R. A primeira, por considerá-la totalmente e, a segunda, por desconsiderá-la. Desta forma, considerou-se a matriz S* como a mais indicada no presente estudo de caso, uma vez que priorizou como mais importantes, as variáveis econômicas mais relacionadas às variáveis zootécnicas.
|
2 |
Large-scale Comparative Study of Hi-C-based Chromatin 3D Structure Modeling MethodsWang, Cheng 17 May 2018 (has links)
Chromatin is a complex polymer molecule in eukaryotic cells, primarily consisting of DNA and histones. Many works have shown that the 3D folding of chromatin structure plays an important role in DNA expression. The recently proposed Chro- mosome Conformation Capture technologies, especially the Hi-C assays, provide us an opportunity to study how the 3D structures of the chromatin are organized. Based on the data from Hi-C experiments, many chromatin 3D structure modeling methods have been proposed. However, there is limited ground truth to validate these methods and no robust chromatin structure alignment algorithms to evaluate the performance of these methods.
In our work, we first made a thorough literature review of 25 publicly available population Hi-C-based chromatin 3D structure modeling methods. Furthermore, to evaluate and to compare the performance of these methods, we proposed a novel data simulation method, which combined the population Hi-C data and single-cell Hi-C data without ad hoc parameters. Also, we designed a global and a local alignment algorithms to measure the similarity between the templates and the chromatin struc- tures predicted by different modeling methods. Finally, the results from large-scale comparative tests indicated that our alignment algorithms significantly outperform the algorithms in literature.
|
3 |
A Simulation Study On Marginalized Transition Random Effects Models For Multivariate Longitudinal Binary DataYalcinoz, Zerrin 01 May 2008 (has links) (PDF)
In this thesis, a simulation study is held and a statistical model is fitted to the simulated data. This data is assumed to be the satisfaction of the customers who withdraw their salary from a particular bank. It is a longitudinal data which has bivariate and binary response. It is assumed to be collected from 200 individuals at four different time points. In such data sets, two types of dependence -the dependence within subject measurements and the dependence between responses- are important and these are considered in the model. The model is Marginalized Transition Random Effects Models, which has three levels. The first level measures the effect of covariates on responses, the second level accounts for temporal changes, and the third level measures the difference between individuals. Markov Chain Monte Carlo methods are used for the model fit. In the simulation study, the changes between the estimated values and true parameters are searched under two conditions, when the model is correctly specified or not. Results suggest that the better convergence is obtained with the full model. The third level which observes the individual changes is more sensitive to the model misspecification than the other levels of the model.
|
4 |
Eficiência de estimadores, geradores e algoritmos na simulação de dados diários de precipitação pluviométrica utilizando a distribuição gamaRickli, Leila Issa [UNESP] 05 May 2006 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:31:36Z (GMT). No. of bitstreams: 0
Previous issue date: 2006-05-05Bitstream added on 2014-06-13T20:02:21Z : No. of bitstreams: 1
rickli_li_dr_botfca.pdf: 601367 bytes, checksum: abdaa96dc222f7b27de2cdeb6898ad34 (MD5) / Universidade Estadual Paulista (UNESP) / O aumento populacional do planeta tem exigido cada vez mais produtividade na agricultura objetivando suprir suas necessidades alimenticias. Um dos mais importantes fatores que determinam o sucesso ou o fracasso desta producao sao as variaveis climaticas, dentre elas, pode-se citar a precipitacao pluviometrica. A presente pesquisa analisou a eficiencia dos fatores funcionais no processo de simulacao de dados diarios de precipitacao utilizando a distribuicao Gama. Foram utilizadas series climatologicas diarias para as localidades de Piracicaba . SP e Ponta Grossa . PR. Para determinacao dos estimadores dos parametros da distribuicao Gama (ãá e ãâ), foram avaliados os procedimentos baseados no metodo dos momentos, da verossimilhanca e o metodo numerico de Greenwood & Durand. Avaliou-se tres geradores de numeros pseudo-aleatorios congruencias e dois algoritmos computacionais para geracao da variavel aleatorias Gama que foram implementados no simulador Sedac_R. Por meio de procedimentos estatisticos a validacao apontou que a escolha adequada do metodo para estimativa dos parametros da distribuicao Gama e o algoritmo computacional para geracao da variavel aleatoria Gama devem ser levados em consideracao na simulacao de series climaticas de precipitacao. Em relacao ao gerador de numeros pseudo-aleatorios os resultados indicaram... / The increase of people in the planet has required more productivity in the agriculture field in order to supply the food need. One of the most important factors that determine the success or the failures of that productivity are the climatic variables, such as the rain precipitation. This research analyzed the efficiency of the functional factors in the precipitation daily data simulation process, using the Gamma distribution. Daily climatic series related to the Piracicaba - SP and Ponta Grossa - PR cities were used. The procedures based on the Greenwood & Durand numerical, Likelihood and Moment methods were evaluated aiming to determine the approximation of the parameters of the Gamma distribution (á and â). Three congruent pseudorandom generators and two computational algorithms to generate the Gamma random variable implemented in the Sedac_R simulator were evaluated. By way of statistics procedures, the validation indicated that the suitable choose to both the approximation method of the parameters of the Gamma distribution (á and â) and the computational algorithm to generate the Gamma random variable must be taken into consideration in the precipitation climatic series simulation. Related to the numerical pseudo-random generator the results showed that it doesn t interferes in the accuracy of the generated data.
|
5 |
Eficiência de estimadores, geradores e algoritmos na simulação de dados diários de precipitação pluviométrica utilizando a distribuição gama /Rickli, Leila Issa, 1948- January 2006 (has links)
Resumo: O aumento populacional do planeta tem exigido cada vez mais produtividade na agricultura objetivando suprir suas necessidades alimenticias. Um dos mais importantes fatores que determinam o sucesso ou o fracasso desta producao sao as variaveis climaticas, dentre elas, pode-se citar a precipitacao pluviometrica. A presente pesquisa analisou a eficiencia dos fatores funcionais no processo de simulacao de dados diarios de precipitacao utilizando a distribuicao Gama. Foram utilizadas series climatologicas diarias para as localidades de Piracicaba . SP e Ponta Grossa . PR. Para determinacao dos estimadores dos parametros da distribuicao Gama (ãá e ãâ), foram avaliados os procedimentos baseados no metodo dos momentos, da verossimilhanca e o metodo numerico de Greenwood & Durand. Avaliou-se tres geradores de numeros pseudo-aleatorios congruencias e dois algoritmos computacionais para geracao da variavel aleatorias Gama que foram implementados no simulador Sedac_R. Por meio de procedimentos estatisticos a validacao apontou que a escolha adequada do metodo para estimativa dos parametros da distribuicao Gama e o algoritmo computacional para geracao da variavel aleatoria Gama devem ser levados em consideracao na simulacao de series climaticas de precipitacao. Em relacao ao gerador de numeros pseudo-aleatorios os resultados indicaram... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The increase of people in the planet has required more productivity in the agriculture field in order to supply the food need. One of the most important factors that determine the success or the failures of that productivity are the climatic variables, such as the rain precipitation. This research analyzed the efficiency of the functional factors in the precipitation daily data simulation process, using the Gamma distribution. Daily climatic series related to the Piracicaba - SP and Ponta Grossa - PR cities were used. The procedures based on the Greenwood & Durand numerical, Likelihood and Moment methods were evaluated aiming to determine the approximation of the parameters of the Gamma distribution (á and â). Three congruent pseudorandom generators and two computational algorithms to generate the Gamma random variable implemented in the Sedac_R simulator were evaluated. By way of statistics procedures, the validation indicated that the suitable choose to both the approximation method of the parameters of the Gamma distribution (á and â) and the computational algorithm to generate the Gamma random variable must be taken into consideration in the precipitation climatic series simulation. Related to the numerical pseudo-random generator the results showed that it doesnt interferes in the accuracy of the generated data. / Orientador: Ângelo Catâneo / Coorientador: Jorim Souza Virgens Filho / Banca: Célia Regina Lopes Zimback / Banca: Manoel Henrique Salgado / Banca: Marcelo Giovaneti Canteri / Banca: José Fernando Mantovani Micali / Doutor
|
6 |
Monitoring Tools File Specification: Version 1.0Vogelsang, Stefan January 2016 (has links)
This paper describes the format of monitoring data files that are collected for external measuring sites and at laboratory experiments at the Institute for Building Climatology (IBK). The Monitoring Data Files are containers for storing time series or event driven data collected as input for transient heat and moisture transport simulations. Further applications are the documentation of real world behaviour, laboratory experiments or the collection of validation data sets for simulation results ( whole building / energy consumption / HAM ). The article also discusses the application interface towards measurement data verification tools as well as data storage solutions that can be used to archive measurement data files conveniently and efficiently.:1 Introduction
2 File Name Conventions
3 Headers
3.1 Specifics on Time Series Header Files
3.2 Specifics s on Event Driven Header Files
4 Data Section Format Description
5 SI Unit Strings
6 Competition Law Advice
7 Liability for external Links
|
7 |
Human Age Prediction Based on Real and Simulated RR Intervals using Temporal Convolutional Neural Networks and Gaussian ProcessesPfundstein, Maximilian January 2020 (has links)
Electrocardiography (ECG) is a non-invasive method used in medicine to track the electrical pulses sent by the heart. The time between two subsequent electrical impulses and hence the heartbeat of a subject, is referred to as an RR interval. Previous studies show that RR intervals can be used for identifying sleep patterns and cardiovascular diseases. Additional research indicates that RR intervals can be used to predict the cardiovascular age of a subject. This thesis investigates, if this assumption is true, based on two different datasets as well as simulated data based on Gaussian Processes. The datasets used are Holter recordings provided by the University of Gdańsk as well as a dataset provided by Physionet. The former represents a balanced dataset of recordings during nocturnal sleep of healthy subjects whereas the latter one describes an imbalanced dataset of records of a whole day of subjects that suffered from myocardial infarction. Feature-based models as well as a deep learning architecture called DeepSleep, based on a paper for sleep stage detection, are trained. The results show, that the prediction of a subject's age, only based in RR intervals, is difficult. For the first dataset, the highest obtained test accuracy is 37.84 per cent, with a baseline of 18.23 per cent. For the second dataset, the highest obtained accuracy is 42.58 per cent with a baseline of 39.14 per cent. Furthermore, data is simulated by fitting Gaussian Processes to the first dataset and following a Bayesian approach by assuming a distribution for all hyperparameters of the kernel function in use. The distributions for the hyperparameters are continuously updated by fitting a Gaussian Process to a slices of around 2.5 minutes. Then, samples from the fitted Gaussian Process are taken as simulated data, handling impurity and padding. The results show that the highest accuracy achieved is 31.12 per cent with a baseline of 18.23 per cent. Concludingly, cardiovascular age prediction based on RR intervals is a difficult problem and complex handling of impurity does not necessarily improve the results.
|
8 |
Análise espectral, geração de estrutura e simulação de dados de RMN 13C / Steroids: spectral analysis, structure generation and simulation of 13C NMR dataFerreira, Marcelo José Pena 24 October 2003 (has links)
O sistema especialista SISTEMAT tem por objetivo auxiliar pesquisadores da área de produtos naturais no processo de determinação estrutural de substâncias. Para tanto, utilizando dados provenientes de várias técnicas espectrométricas e espectroscópicas, principalmente RMN 13C, inúmeros programas foram desenvolvidos com a finalidade de propor o provável esqueleto de uma substância. Essa informação, juntamente com as substruturas apresentadas a partir de um conjunto de dados, é utilizada por geradores estruturais como grandes restrições, a fim de impedir a explosão combinatória e a geração de propostas estruturais incompatíveis com produtos naturais, além de reduzir o elevado tempo computacional gasto durante uma análise. Esse trabalho descreve o desenvolvimento e utilização dos módulos de reconhecimento de esqueletos, determinação e geração estrutural e simulação de dados de RMN 13C de esteróides. Assim, foi elaborada uma base de dados com 1436 substâncias distribuídas entre 119 tipos de esqueletos provenientes das mais diversas fontes naturais. Vários testes foram realizados e bons percentuais de acerto foram obtidos para o reconhecimento de esqueletos e geração de propostas estruturais através da sobreposição dos tipos de anéis encontrados em esqueletos de esteróides. Para validar as propostas estruturais apresentadas pelo gerador, bem como para prever os dados de deslocamentos químicos de novos esteróides, o simulador de dados de RMN 13C foi usado e, quando comparado a um programa comercial de mesma finalidade, apresentou maior exatidão na previsão dos dados. / The aim of the expert system SISTEMAT is to aid natural product researchers in the process of structural determination of organic substances. For that, using data from various spectrometric and spectroscopic techniques, mainly 13C NMR, countless programs were developed to propose the most probable skeleton of a substance. This information together with the substructures shown from the data set are utilized by structural generators as important constraints in order to avoid the combinatorial explosion problem and the generation of incompatible structural proposals for natural products, besides reducing the computational time spent during the analysis. This work describes the development and use of the modules of skeleton identification, structural determination and generation, and the 13C NMR data prediction of steroids. Thus, was built a database containing 1436 steroids distributed in 119 different skeletons originated from the most varied natural sources. Several tests were performed, wherein good hit percentuals were obtained for the skeleton identification and structural generation through the overlapping of the types of rings found in the steroid skeletons. For validation of the structural proposals shown by the generator as well as for prediction of the chemical shift data of new substances, the simulator of 13C NMR data was used and next compared with a commercial program of the same purpose, and exhibited higher accuracy in the data prediction.
|
9 |
SegmentaÃÃo de imagens de radar de abertura sintÃtica por crescimento e fusÃo estatÃstica de regiÃes / Segmentation of synthetic aperture radar images by growth and statistical fusion of the regionsEduardo Alves de Carvalho 23 May 2005 (has links)
Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico / A cobertura regular de quase todo o planeta por sistemas de radar de abertura sintÃtica (synthetic aperture radar - SAR) orbitais e o uso de sistemas aerotransportados tÃm propiciado novos meios para obter informaÃÃes atravÃs do sensoriamento remoto de vÃrias regiÃes de nosso planeta, muitas delas inacessÃveis. Este trabalho trata do processamento de imagens digitais geradas por radar de abertura sintÃtica, especificamente da segmentaÃÃo, que consiste do isolamento ou particionamento dos objetos relevantes presentes em uma cena. A segmentaÃÃo de imagens digitais visa melhorar a interpretaÃÃo das mesmas em procedimentos subseqÃentes. As imagens SAR sÃo corrompidas por ruÃdo coerente, conhecido por speckle, que mascara pequenos detalhes e zonas de transiÃÃo entre os objetos. Tal ruÃdo à inerente ao processo de formaÃÃo dessas imagens e dificulta tarefas como a segmentaÃÃo automÃtica dos objetos existentes e a identificaÃÃo de seus
contornos. Uma possibilidade para efetivar a segmentaÃÃo de imagens SAR consiste na filtragem preliminar do ruÃdo speckle, como etapa de tratamento dos dados. A outra possibilidade, aplicada neste trabalho, consiste em segmentar diretamente a imagem ruidosa, usando seus pixels originais como fonte de informaÃÃo. Para isso, Ã desenvolvida uma metodologia de segmentaÃÃo baseada em crescimento e fusÃo estatÃstica de regiÃes, que requer alguns parÃmetros para controlar o processo. As vantagens da utilizaÃÃo dos dados originais para realizar a segmentaÃÃo de imagens de radar sÃo a eliminaÃÃo de
etapas de prÃ-processamento e o favorecimento da detecÃÃo das estruturas presentes nas mesmas. Ã realizada uma avaliaÃÃo qualitativa e quantitativa das imagens segmentadas,
sob diferentes situaÃÃes, aplicando a tÃcnica proposta em imagens de teste contaminadas artificialmente com ruÃdo multiplicativo. Este segmentador à aplicado tambÃm no
processamento de imagens SAR reais e os resultados sÃo promissores. / The regular coverage of the planet surface by spaceborne synthetic aperture radar (SAR)and also airborne systems have provided alternative means to gather remote sensing information of various regions of the planet, even of inaccessible areas. This work deals with the digital processing of synthetic aperture radar imagery, where segmentation is the main subject. It consists of isolating or partitioning relevant objects in a scene, aiming at improving image interpretation and understanding in subsequent tasks. SAR images are contaminated by coherent noise, known as speckle, which masks small details and transition zones among the objects. Such a noise is inherent in radar image generation process, making difficult tasks like automatic segmentation of the objects, as well as their contour identification. To segment radar images, one possible way is to apply speckle filtering before segmentation. Another one, applied in this work, is to perform noisy image segmentation using the original SAR pixels as input data, without any preprocessing,such as filtering. To provide segmentation, an algorithm based on region growing and statistical region merging has been developed, which requires some parameters to control the process. This task presents some advantages, as long as it eliminates preprocessing steps and favors the detection of the image structures, since original pixel information is
exploited. A qualitative and quantitative performance evaluation of the segmented images is also executed, under different situations, by applying the proposed technique to
simulated images corrupted with multiplicative noise. This segmentation method is also applied to real SAR images and the produced results are promising.
|
10 |
Análise espectral, geração de estrutura e simulação de dados de RMN 13C / Steroids: spectral analysis, structure generation and simulation of 13C NMR dataMarcelo José Pena Ferreira 24 October 2003 (has links)
O sistema especialista SISTEMAT tem por objetivo auxiliar pesquisadores da área de produtos naturais no processo de determinação estrutural de substâncias. Para tanto, utilizando dados provenientes de várias técnicas espectrométricas e espectroscópicas, principalmente RMN 13C, inúmeros programas foram desenvolvidos com a finalidade de propor o provável esqueleto de uma substância. Essa informação, juntamente com as substruturas apresentadas a partir de um conjunto de dados, é utilizada por geradores estruturais como grandes restrições, a fim de impedir a explosão combinatória e a geração de propostas estruturais incompatíveis com produtos naturais, além de reduzir o elevado tempo computacional gasto durante uma análise. Esse trabalho descreve o desenvolvimento e utilização dos módulos de reconhecimento de esqueletos, determinação e geração estrutural e simulação de dados de RMN 13C de esteróides. Assim, foi elaborada uma base de dados com 1436 substâncias distribuídas entre 119 tipos de esqueletos provenientes das mais diversas fontes naturais. Vários testes foram realizados e bons percentuais de acerto foram obtidos para o reconhecimento de esqueletos e geração de propostas estruturais através da sobreposição dos tipos de anéis encontrados em esqueletos de esteróides. Para validar as propostas estruturais apresentadas pelo gerador, bem como para prever os dados de deslocamentos químicos de novos esteróides, o simulador de dados de RMN 13C foi usado e, quando comparado a um programa comercial de mesma finalidade, apresentou maior exatidão na previsão dos dados. / The aim of the expert system SISTEMAT is to aid natural product researchers in the process of structural determination of organic substances. For that, using data from various spectrometric and spectroscopic techniques, mainly 13C NMR, countless programs were developed to propose the most probable skeleton of a substance. This information together with the substructures shown from the data set are utilized by structural generators as important constraints in order to avoid the combinatorial explosion problem and the generation of incompatible structural proposals for natural products, besides reducing the computational time spent during the analysis. This work describes the development and use of the modules of skeleton identification, structural determination and generation, and the 13C NMR data prediction of steroids. Thus, was built a database containing 1436 steroids distributed in 119 different skeletons originated from the most varied natural sources. Several tests were performed, wherein good hit percentuals were obtained for the skeleton identification and structural generation through the overlapping of the types of rings found in the steroid skeletons. For validation of the structural proposals shown by the generator as well as for prediction of the chemical shift data of new substances, the simulator of 13C NMR data was used and next compared with a commercial program of the same purpose, and exhibited higher accuracy in the data prediction.
|
Page generated in 0.1239 seconds