Spelling suggestions: "subject:"correlation (estatistics)."" "subject:"correlation (cstatistics).""
141 |
Parental and youth attributions, acculturation, and treatment engagement of Latino families in youth mental health services a preliminary examination /Ho, Judy Keeching. January 2007 (has links)
Thesis (Ph. D.)--University of California, San Diego and San Diego State University, 2007. / Title from first page of PDF file (viewed May 29, 2007). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 53-65).
|
142 |
Análise exploratória de dados espaciais aplicada a produtividade de milho no estado do Paraná / Exploratory analysis of spatial data applied to corn yield in the state of ParanáSeffrin, Rodolfo 20 April 2017 (has links)
A cultura do milho é uma das atividades agrícolas mais importantes para a economia no Brasil e a utilização de modelos estatísticos pode auxiliar a tomada de decisão neste setor produtivo. O presente estudo visou identificar áreas com correlação e autocorrelação espacial para a produtividade de milho e suas variáveis preditoras (temperatura média, precipitação pluvial, radiação solar, potencialidade agrícola do solo e altitude), e também, verificar o modelo de regressão espacial mais adequado para a explicação da cultura. O estudo foi realizado utilizando dados de municípios do estado do Paraná referente a safras de verão dos anos agrícolas de 2011/2012, 2012/2013 e 2013/2014. Os softwares utilizados para a análise estatística e geração dos mapas temáticos foram o ArcMap 9.3 e GeoDa 1.6.7. A identificação da dependência espacial entre as variáveis foi realizada por meio do índice de Moran Global (Univariado e Bivariado) e o índice local de associação espacial (LISA), concluindo-se que para todos os anos e critérios de vizinhança utilizados, houve autocorrelação espacial significativa ao nível de 1% para todas as variáveis. Verificouse ainda que a temperatura média, precipitação e altitude, estão correlacionadas significativamente (P-value<5%) com a produtividade do milho em todos os anos e critérios estudados. As variáveis: radiação solar e potencialidade agrícola do solo não apresentaram correlação significativa para alguns dos anos (2012/2013) e matrizes de vizinhança (contingência queen e vizinho mais próximo). Para determinar o modelo de regressão mais apropriado para a estimativa da produtividade de milho, foi adotado o diagnóstico estatístico do modelo de regressão OLS - Ordinary Least Square, que verifica se é necessário aplicar algum modelo de regressão espacial para explicação dos dados. Para todos anos agrícolas foi recomendado a utilização do modelo de regressão espacial SAR – Spatial Lag Model, sendo que apenas para o ano agrícola 2013/2014 pode ser recomendado o modelo Spatial Error Model (CAR). A regressão espacial (SAR e CAR) adotada para a estimativa da produtividade de milho em diferentes anos, obteve melhores resultados quando comparada com os resultados da regressão que não incorpora a autocorrelação espacial dos dados (OLS). O coeficiente de determinação R², os critérios de informação bayesiano (BIC) e o máximo valor do logaritmo da função verossimilhança (Log-likelihood), apresentou melhora significativa na estimação da produtividade do milho quando utilizado SAR e CAR. / Corn cultivation is one of the most important agricultural activities for the Brazilian economy and the use of statistical models can help decision making in this productive sector. The present study aimed to identify areas with correlation and spatial autocorrelation for maize productivity and its predictive variables (mean temperature, rainfall, solar radiation, soil potential and altitude), and to verify the spatial regression model most appropriate for The explanation of culture. The study was carried out using data from municipalities in the state of Paraná referring to summer crops of the 2011/2012, 2012/2013 and 2013/2014 agricultural years. The software used for the statistical analysis and generation of thematic maps were ArcMap 9.3 and GeoDa 1.6.7. The identification of spatial dependence among variables was performed using the Moran Global Index (Univariate and Bivariate) and the local spatial association index (LISA). It was concluded that for all the years and neighborhood criteria used, there was spatial autocorrelation Significant at the 1% level for all variables. It was also verified that the average temperature, precipitation and altitude, are correlated significantly (P-value <5%) with corn yield in all years and criteria studied. The variables: solar radiation and soil agricultural potential did not present significant correlation for some of the years (2012/2013) and neighborhood matrices (queen contingency and nearest neighbor). To determine the most appropriate regression model for estimating maize productivity, the OLS - Ordinary Least Square regression model was used to verify if it is necessary to apply some spatial regression model to explain the data. For all agricultural years it was recommended to use the spatial regression model Spatial Lag Model (SLM), and only for the agricultural year 2013/2014 can the Spatial Error Model (SEM) be recommended. The spatial regression (SLM and SEM) adopted for estimating maize productivity in different years yielded better results when compared with regression results that did not incorporate spatial data autocorrelation (OLS). The coefficient of determination R², the Bayesian information criteria (BIC) and the maximum log-likelihood value, showed a significant improvement in corn productivity estimation when using SLM and SEM.
|
143 |
Análise exploratória de dados espaciais aplicada a produtividade de milho no estado do Paraná / Exploratory analysis of spatial data applied to corn yield in the state of ParanáSeffrin, Rodolfo 20 April 2017 (has links)
A cultura do milho é uma das atividades agrícolas mais importantes para a economia no Brasil e a utilização de modelos estatísticos pode auxiliar a tomada de decisão neste setor produtivo. O presente estudo visou identificar áreas com correlação e autocorrelação espacial para a produtividade de milho e suas variáveis preditoras (temperatura média, precipitação pluvial, radiação solar, potencialidade agrícola do solo e altitude), e também, verificar o modelo de regressão espacial mais adequado para a explicação da cultura. O estudo foi realizado utilizando dados de municípios do estado do Paraná referente a safras de verão dos anos agrícolas de 2011/2012, 2012/2013 e 2013/2014. Os softwares utilizados para a análise estatística e geração dos mapas temáticos foram o ArcMap 9.3 e GeoDa 1.6.7. A identificação da dependência espacial entre as variáveis foi realizada por meio do índice de Moran Global (Univariado e Bivariado) e o índice local de associação espacial (LISA), concluindo-se que para todos os anos e critérios de vizinhança utilizados, houve autocorrelação espacial significativa ao nível de 1% para todas as variáveis. Verificouse ainda que a temperatura média, precipitação e altitude, estão correlacionadas significativamente (P-value<5%) com a produtividade do milho em todos os anos e critérios estudados. As variáveis: radiação solar e potencialidade agrícola do solo não apresentaram correlação significativa para alguns dos anos (2012/2013) e matrizes de vizinhança (contingência queen e vizinho mais próximo). Para determinar o modelo de regressão mais apropriado para a estimativa da produtividade de milho, foi adotado o diagnóstico estatístico do modelo de regressão OLS - Ordinary Least Square, que verifica se é necessário aplicar algum modelo de regressão espacial para explicação dos dados. Para todos anos agrícolas foi recomendado a utilização do modelo de regressão espacial SAR – Spatial Lag Model, sendo que apenas para o ano agrícola 2013/2014 pode ser recomendado o modelo Spatial Error Model (CAR). A regressão espacial (SAR e CAR) adotada para a estimativa da produtividade de milho em diferentes anos, obteve melhores resultados quando comparada com os resultados da regressão que não incorpora a autocorrelação espacial dos dados (OLS). O coeficiente de determinação R², os critérios de informação bayesiano (BIC) e o máximo valor do logaritmo da função verossimilhança (Log-likelihood), apresentou melhora significativa na estimação da produtividade do milho quando utilizado SAR e CAR. / Corn cultivation is one of the most important agricultural activities for the Brazilian economy and the use of statistical models can help decision making in this productive sector. The present study aimed to identify areas with correlation and spatial autocorrelation for maize productivity and its predictive variables (mean temperature, rainfall, solar radiation, soil potential and altitude), and to verify the spatial regression model most appropriate for The explanation of culture. The study was carried out using data from municipalities in the state of Paraná referring to summer crops of the 2011/2012, 2012/2013 and 2013/2014 agricultural years. The software used for the statistical analysis and generation of thematic maps were ArcMap 9.3 and GeoDa 1.6.7. The identification of spatial dependence among variables was performed using the Moran Global Index (Univariate and Bivariate) and the local spatial association index (LISA). It was concluded that for all the years and neighborhood criteria used, there was spatial autocorrelation Significant at the 1% level for all variables. It was also verified that the average temperature, precipitation and altitude, are correlated significantly (P-value <5%) with corn yield in all years and criteria studied. The variables: solar radiation and soil agricultural potential did not present significant correlation for some of the years (2012/2013) and neighborhood matrices (queen contingency and nearest neighbor). To determine the most appropriate regression model for estimating maize productivity, the OLS - Ordinary Least Square regression model was used to verify if it is necessary to apply some spatial regression model to explain the data. For all agricultural years it was recommended to use the spatial regression model Spatial Lag Model (SLM), and only for the agricultural year 2013/2014 can the Spatial Error Model (SEM) be recommended. The spatial regression (SLM and SEM) adopted for estimating maize productivity in different years yielded better results when compared with regression results that did not incorporate spatial data autocorrelation (OLS). The coefficient of determination R², the Bayesian information criteria (BIC) and the maximum log-likelihood value, showed a significant improvement in corn productivity estimation when using SLM and SEM.
|
144 |
Testes rapidos (kits) para avaliação da qualidade de oleos, gorduras e produtos que os contenham e sua correlação com os metodos oficiais da AOCS / Rapids tests (Kits) which serve for quality evaliation of oils, fats and products containin them and this correlatin which AOCS officials methodsOsawa, Cibele Cristina 21 February 2005 (has links)
Orientador: Lireny Ap. G. Gonçalves / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia de Alimentos / Made available in DSpace on 2018-08-04T02:22:07Z (GMT). No. of bitstreams: 1
Osawa_CibeleCristina_M.pdf: 1683509 bytes, checksum: 6e2a1d6c8a259eda7c94a6abfcbbe479 (MD5)
Previous issue date: 2005 / Resumo: DiaMed Food Analysis Test System (F.A.T.S.) é composto por testes rápidos, alternativos aos métodos convencionais de determinação de ácidos graxos livres (%AGL), índice de peróxidos (IP), concentração de alquenais (teste de p-anisidina) e malonaldeído (teste de TBA). Esses testes podem ser empregados na avaliação de óleos, gorduras e produtos que os contenham. Além disso, os equipamentos são compactos, não necessitando de laboratórios altamente equipados e as quantidades relativamente pequenas de amostra requeridas geram menos resíduos. Eles oferecem, ainda, melhores condições de trabalho aos analistas envolvidos do que os métodos convencionais. Substituem, por exemplo, o reagente de p-anisidina, considerado carcinogênico. Até o momento, apenas os kits de %AGL e IP foram certificados pela AOAC e existem poucos estudos utilizando-os. Em vista disso, o presente trabalho teve como objetivo correlacionar os kits com os métodos oficiais da AOCS, aplicados a óleos vegetais brutos e degomados, óleos refinados, azeites de oliva (avaliados com 4 kits de diferentes faixas de detecção de %AGL e IP), óleos de fritura coletados no restaurante universitário (TBA e p-anisidina) e rações fornecidas por empresa nacional (%AGL, IP, TBA e p-anisidina). Em todos os casos, as análises foram conduzidas em triplicata, utilizando os métodos convencionais. No dia seguinte, analisaram-se as mesmas amostras com os kits correspondentes, de acordo com as instruções do fornecedor. Adotaram-se os métodos Ca 5a-40, Cd 8b-90, Cd 18b-90 e Cd 19b-90, respectivamente, para %AGL, IP, teste de p-anisidina e TBA. As determinações para as amostras de ração foram feitas no lipídio extraído a frio com mistura de éter de petróleo e éter etílico. A correlação entre metodologias foi determinada pelo Método dos Mínimos Quadrados de Regressão Linear (usando o programa Minitab for Windows versão 12.1) e o teste de ANOVA foi utilizado para comparar médias de diferentes metodologias (software SAS for Windows V. 8). Para as amostras de ração e óleo de fritura, a variação entre as amostras testadas usando um mesmo método foi também determinada. Houve alta correlação entre os resultados do método oficial e pelo kit em relação à %AGL e IP de óleos refinados e azeites de oliva e à %AGL das rações e aos valores de p-anisidina de óleos de fritura (r = 0,74-0,99). Os resultados de óleos brutos e degomados foram significativamente diferentes, devido à interferência dos pigmentos presentes. As amostras de ração apresentaram estágio oxidativo pouco avançado, de forma que não puderam ser avaliadas precisamente com os kits. Para os óleos de fritura, o kit de TBA foi mais sensível que o método oficial. Além disso, constataram-se deficiências nos métodos oficiais de determinação de IP, malonaldeído e alquenais, uma vez que o método de peróxido não foi sensível para IP menor que 2 meq/kg, ao passo que o aquecimento empregado no teste de TBA gerou valores mais elevados que os obtidos com o kit equivalente e a presença de água nas amostras e reagentes interferiram nos resultados de p-anisidina. Sendo assim, os kits DiaMed F.A.T.S. forneceram resultados precisos e se tornaram uma opção viável para as superar as limitações econômicas dos estudos de vida útil enfrentadas pelos métodos convencionais, especialmente para amostras com baixo teor lipídico / Abstract: The DiaMed Food Analysis Test System (F.A.T.S.) consists of rapid tests which serve as alternatives for conventional methods in the determination of free fatty acids (%FFA), peroxide value (PV), and the concentration of alkenal (p-anisidine test) and malonaldehyde (TBA test). These tests can be employed in the evaluation of oils, fats and products containing them. Moreover, the compact equipment makes a large, well-equipped laboratory unnecessary and the relatively small sample size generates minimum residues. They also furnish better working conditions for the analysts involved since the p-anisidine used as a reagent is carcinogenic. At present, only the %FFA and PV kits have been certified by the AOCS, but few studies have used them. The present study was thus designed to correlate the results obtained from the use of these kits with those obtained by the official AOCS methods for crude, degummed and refined vegetable oils and olive oils (tested with 4 kits for different levels of %FFA and PV), frying oils collected from the university restaurant (tested for TBA and p-anisidine values) and pet foods supplied by a national industry (tested for %FFA, PV and TBA and p-anisidine values). In all cases the analyses were performed in triplicate using conventional methods. On the following day, the same samples were analyzed using the corresponding kits, according to the instruction of the manufacturer. The official AOCS methods adopted for the determination of %FFA, PV and p-anisidine and TBA values were Ca 5a-40, Cd 8b-90, Cd 18b-90 and Cd 19b-90, respectively. For the pet food samples, lipids were extracted with a mixture of petroleum ether and ethylic ether in cold conditions. The correlations between methodologies were determined using the Least Squares Method of Linear Regression (using Minitab for Windows, version 12.1), and an ANOVA was used to compare the averages of the different methods (SAS for Windows V. 8 software). For both pet food and frying oil samples, the variation between samples tested using the same method was also determined. High correlations were found between the results of official and kit methods in relation to %FFA and PV for refined and olive oils, and in relation to %FFA in pet foods and p-anisidine values in frying oils (r = 0.74-0.99). Results for crude and degummed oils were significantly different, however, due to the effect of pigments. The pet foods exhibited a relatively low oxidative state and thus could not be analyzed accurately with the kits. For frying oils, the TBA kit proved more sensitive than the official method. Moreover, the official methods presented problems for the determination of PV and malonaldehyde and alkenal concentrations, since they were not very sensitive for PV levels of less than 2 meq/kg, whereas the use of heating in the official TBA test led to higher values than those obtained when using the equivalent kit and the presence of water in the samples and reagents interfered in the results of the p-anisidine test. The DiaMed F.A.T.S. kits thus proved to provide accurate results. Moreover, they provide a feasible option for overcoming the economic difficulties of shelf-life studies faced by conventional methods, especially for samples with a low lipid content / Mestrado / Tecnologia de Alimentos / Mestre em Tecnologia de Alimentos
|
145 |
Information transmission by the synchronous activity of neuronal populationsKruscha, Alexandra 21 September 2017 (has links)
Sensorische Nervenzellen kodieren Informationen über die Umwelt mittels
elektrischer Impulse, sogenannte Aktionspotentiale oder Spikes. Diese werden
weitergeleitet zu postsynaptischen Neuronen im zentralen Nervensystem, welche
unterschiedliche Auslesestrategien verwenden. Integratorzellen summieren alle
ankommenden Aktionspotentiale auf, wodurch sie die Gesamtaktivität einer
präsynaptischen Population messen. Koinzidenzdetektoren hingegen, werden nur
durch das synchrone Feuern der zuführenden Neuronenpopulation aktiviert.
Die grundlegende Frage dieser Dissertation lautet: Welche Information eines
zeitabhängigen Signals kodieren die synchronen Spikes einer Neuronenpopulation
im Vergleich zu der Summe all ihrer Aktionspotentiale? Hierbei verwenden wir die
Theorie stochastischer Prozesse: wir berechnen Spektralmaße, die es ermöglichen
Aussagen darüber zu treffen welche Frequenzkomponenten eines Signals vorwiegend
transmittiert werden. Im Gegensatz zu früheren Studien, verstehen wir unter
einem synchronen Ereignis nicht zwangsläufig, dass die gesamte Population
simultan feuert, sondern, dass ein minimaler Anteil („Synchronizitätsschranke")
gleichzeitig aktiv ist. Unsere Analyse zeigt, dass die synchrone
Populationsaktivität als ein Bandpass-Informationsfilter agieren kann: die
synchronen Spikes kodieren hauptsächlich schnelle Signalanteile. Damit stellt
die Selektion simultaner Neuronenaktivität ein potentielles Mittel dar um
gleichzeitig anwesende, konkurrierende Signale voneinander zu trennen. Dabei
hängen die genauen Charakteristika der Informationsfilterung ausschlaggebend von
der Synchronizitätsschwelle ab. Insbesondere zeigt sich, dass eine Symmetrie in
der Schwelle vorliegt,die die Äquivalenz der Kodierungseigenschaften von
synchronem Feuern und synchronem Schweigen offenlegt. Unsere analytischen
Ergebnisse testen wir mittels numerischer Simulationen und vergleichen sie mit
Experimenten am schwach elektrischen Fisch. / Populations of sensory neurons encode information about the environment into electrical pulses, so called action potentials or spikes. Neurons in the brain process these pulses further by using different readout strategies.
Integrator cells sum up all incoming action potentials and are thus sensitive to the overall activity of a presynaptic population.
Coincidence detectors, on the other hand, are activated by the synchronous firing of the afferent population. The main question of this thesis is: What information about a common time-dependent stimulus is encoded in the synchronous spikes of a neuronal population in comparison to the sum of all spikes? We approach this question within the framework of spectral analysis of stochastic processes, which allows to assess which frequency components of a signal are predominantly encoded. Here, in contrast to earlier studies, a synchronous event does not necessarily mean that all neurons of the population fire simultaneously, but that at least a prescribed fraction ('synchrony threshold') needs to be active within a small time interval.
We derive analytical expressions of the correlation statistics which are compared to numerical simulations and experiments on weakly electric fish. We show that the information transmission of the synchronous output depends highly on the synchrony threshold. We uncover a symmetry in the synchrony threshold, unveiling the similarity in the encoding capability of the common firing and the common silence of a population. Our results demonstrate that the synchronous output can act as a band-pass filter of information, i.e. it extracts predominantly fast components of a stimulus. If signals in different frequency regimes are concurrently present, the selection of synchronous firing events can thus be a tool to separate these signals.
|
146 |
Discovery and adaptation of process viewsMotahari Nezhad, Hamid Reza, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links)
Business process analysis and integration are key endeavours for today's enterprises. Recently, Web services have been widely adopted for the implementation and integration of business processes within and across enterprises. In this dissertation, we investigate the problem of enabling the analysis of service interactions, in today's enterprises, in the context of business process executions, and that of service integration. Our study shows that only fraction of interactions in the enterprise are supported by process-aware systems. However, enabling above-mentioned analyses requires: (i) a model of the underlying business process to be used as a reference for the analysis, and (ii) the ability to correlate events generated during service interactions into process instances. We refer to a process model and the corresponding process instances as a "process view". We propose the concept of process space to refer to all process related information sources in the enterprise, over which various process views are defined. We propose the design and development of a system called "process space discovery system" (PSDS) for discovering process views in a process space. We introduce novel approaches for the correlation of events into process instances, focusing on the public processes of Web services (business protocols), and also for the discovery of the business protocol models from the process instances of a process view. Analysis of service integration approaches shows that while standardisation in Web services simplifies the integration in the communication level, at the higher levels of abstractions (e.g., services interfaces and protocol models) services are still open to heterogeneities. We characterise the mismatches between service interfaces and protocol specifications and introduce "mismatch patterns" to represent them. A mismatch pattern also includes an adapter template that aims at the resolution of the captured mismatch. We also propose semi-automated approaches for identifying the mismatches between interface and protocol specifications of two services. The proposed approaches have been implemented in prototype tools, and experimentally validated on synthetic and real-world datasets. The discovered process views, using PSDS, can be used to perform various analyses in an enterprise, and the proposed adaptation approach facilitates the adoption of Web services in business process integration.
|
147 |
Análisis de datos longitudinales y multivariantes mediante distancias con modelos lineales generalizadosMelo Martínez, Sandra Esperanza 06 September 2012 (has links)
Se propusieron varias metodologías para analizar datos longitudinales (en forma univariante, mediante MANOVA, en curvas de crecimiento y bajo respuesta no normal mediante modelos lineales generalizados) usando distancias entre observaciones (o individuos) con respecto a las variables explicativas con variables respuesta de tipo continuo. En todas las metodologías propuestas al agregar más componentes de la matriz de coordenadas principales se encuentra que se gana en las predicciones con respecto a los modelos clásicos. Por lo cual resulta ser una metodología alternativa frente a la clásica para realizar predicciones.
Se probó que el modelo MANOVA con DB y la aproximación univariante longitudinal con DB generan resultados tan robustos como la aproximación de MANOVA clásica y univariante clásica para datos longitudinales, haciendo uso en la aproximación clásica de máxima verosimilitud restringida y mínimos cuadrados ponderados bajo condiciones de normalidad. Los parámetros del modelo univariante con DB fueron estimados por el método de máxima verosimilitud restringida y por mínimos cuadrados generalizados. Para la aproximación MANOVA con DB se uso mínimos cuadrados bajo condiciones de normalidad. Además, se presentó como realizar inferencia sobre los parámetros involucrados en el modelo para muestras grandes.
Se explicó también una metodología para analizar datos longitudinales mediante modelos lineales generalizados con distancias entre observaciones con respecto a las variables explicativas, donde se encontraron resultados similares a la metodología clásica y la ventaja de poder modelar datos de respuesta continua no normal en el tiempo. Inicialmente, se presenta el modelo propuesto, junto con las ideas principales que dan su origen, se realiza la estimación de parámetros y el contraste de hipótesis. La estimación se hace aplicando la metodología de ecuaciones de estimación generalizada (EEG).
Por medio de una aplicación en cada capítulo se ilustraron las metodologías propuestas. Se ajusto el modelo, se obtuvo la estimación de los diferentes parámetros involucrados, se realizó la inferencia estadística del modelo propuesto y la validación del modelo propuesto. Pequeñas diferencias del método DB con respecto al clásico fueron encontradas en el caso de datos mixtos, especialmente en muestras pequeñas de tamaño 50, resultado obtenido de la simulación.
Mediante simulación para algunos tamaños de muestra se encontró que el modelo ajustado DB produce mejores predicciones en comparación con la metodología tradicional para el caso en que las variables explicativas sean mixtas utilizando la distancia de Gower. En tamaños de muestras pequeñas 50, independiente del valor de la correlación, las estructuras de autocorrelación, la varianza y el número de tiempos, usando los criterios de información Akaike y Bayesiano (AIC y BIC). Además, para muestras pequeñas de tamaño 50 se encuentra más eficiente (eficiencia mayor a 1) el método DB en comparación con el método clásico, bajo los diferentes escenarios considerados. Otro resultado importante es que el método DB presenta mejor ajuste en muestras grandes (100 y 200), con correlaciones altas (0.5 y 0.9), varianza alta (50) y mayor número de mediciones en el tiempo (7 y 10).
Cuando las variables explicativas son solamente de tipo continuo o categórico o binario, se probó que las predicciones son las mismas con respecto al método clásico. Adicionalmente, se desarrollaron los programas en el software R para el análisis de este tipo de datos mediante la metodología clásica y por distancias DB para las diferentes propuestas en cada uno de los capítulos de la tesis, los cuales se anexan en un CD dentro de la tesis. Se esta trabajando en la creación de una librería en R con lo ya programado, para que todos los usuarios tengan acceso a este tipo de análisis.
Los métodos propuestos tienen la ventaja de poder hacer predicciones en el tiempo, se puede modelar la estructura de autocorrelación, se pueden modelar datos con variables explicativas mixtas, binarias, categóricas o continuas, y se puede garantizar independencia en las componentes de la matriz de coordenadas principales mientras que con las variables originales no se puede garantizar siempre independencia. Por último, el método propuesto produce buenas predicciones para estimar datos faltantes, ya que al agregar una o más componentes en el modelo con respecto a las variables explicativas originales de los datos, se puede mejorar el ajuste sin alterar la información original y por consiguiente resulta ser una buena alternativa para el análisis de datos longitudinales y de gran utilidad para investigadores cuyo interés se centra en obtener buenas predicciones. / LONGITUDINAL AND MULTIVARIATE DATA ANALYSIS THROUGH DISTANCES WITH GENERALIZED LINEAR MODELS
We are introducing new methodologies for the analysis of longitudinal data with continuous responses (univariate, multivariate for growth curves and with non-normal response using generalized linear models) based on distances between observations (or individuals) on the explicative variables. In all cases, after adding new components of the principal coordinate matrix, we observe a prediction improvement with respect to the classic models, thus providing an alternative prediction methodology to them.
It was proven that both the distance based MANOVA model and the univariate longitudinal models are as robust as the classical counterparts using restricted maximum likelihood and weighted minimum squares under normality assumptions. The parameters of the distance based univariate model were estimated using restricted maximum likelihood and generalized minimum squares. For the distance based MANOVA we used minimum squares under normality conditions. We also showed how to perform inference on the model parameters on large samples.
We indicated a methodology for the analysis of longitudinal data using generalized linear models and distances between the explanatory variables, where the results were similar to the classical approach. However, our approach allowed us to model continuous, non-normal responses in the time. As well as presenting the model and the motivational ideas, we indicate how to estimate the parameters and hypothesis test on them. For this purpose we use generalized estimating equations (EEG).
We present an application case in each chapter for illustration purposes. The models were fit and validated. After performing some simulations, we found small differences in the distance based method with respect to the classical one for mixed data, particularly in the small sample setting (about 50 individuals).
Using simulation we found that for some sample sizes, the distance based models improve the traditional ones when explanatory variables are mixed and Gower distance is used. This is the case for small samples, regardless of the correlation, autocorrelation structure, the variance, and the number of periods when using both the Akaike (AIC) and Bayesian (BIC) Information Criteria. Moreover, for these small samples, we found greater efficiency (>1) in our model with respect to the classical one. Our models also provide better fits in large samples (100 or 200) with high correlations (0.5 and 0.9), high variance (50) and larger number of time measurements (7 and 10).
We proved that the new and the classical models coincide when explanatory variables are all either continuous or categorical (or binary). We also created programs in R for the analysis of the data considered in the different chapters of this thesis in both models, the classical and the newly proposed one, which are attached in a CD. We are currently working to create a public, accessible R package.
The main advantages of these methods are that they allow for time predictions, the modelization of the autocorrelation structure, and the analysis of data with mixed variables (continuous, categorical and binary). In such cases, as opposed to the classical approach, the independency of the components principal coordinate matrix can always be guaranteed. Finally, the proposed models allow for good missing data estimation: adding extra components to the model with respect to the original variables improves the fit without changing the information original. This is particularly important in the longitudinal data analysis and for those researchers whose main interest resides in obtaining good predictions.
|
148 |
Effective GPS-based panel survey sample size for urban travel behavior studiesXu, Yanzhi 05 April 2010 (has links)
This research develops a framework to estimate the effective sample size of Global Positioning System (GPS) based panel surveys in urban travel behavior studies for a variety of planning purposes. Recent advances in GPS monitoring technologies have made it possible to implement panel surveys with lengths of weeks, months or even years. The many advantageous features of GPS-based panel surveys make such surveys attractive for travel behavior studies, but the higher cost of such surveys compared to conventional one-day or two-day paper diary surveys requires scrutiny at the sample size planning stage to ensure cost-effectiveness.
The sample size analysis in this dissertation focuses on three major aspects in travel behavior studies: 1) to obtain reliable means for key travel behavior variables, 2) to conduct regression analysis on key travel behavior variables against explanatory variables such as demographic characteristics and seasonal factors, and 3) to examine impacts of a policy measure on travel behavior through before-and-after studies. The sample size analyses in this dissertation are based on the GPS data collected in the multi-year Commute Atlanta study. The sample size analysis with regard to obtaining reliable means for key travel behavior variables utilizes Monte Carlo re-sampling techniques to assess the trend of means against various sample size and survey length combinations. The basis for the framework and methods of sample size estimation related to regression analysis and before-and-after studies are derived from various sample size procedures based on the generalized estimating equation (GEE) method. These sample size procedures have been proposed for longitudinal studies in biomedical research. This dissertation adapts these procedures to the design of panel surveys for urban travel behavior studies with the information made available from the Commute Atlanta study.
The findings from this research indicate that the required sample sizes should be much larger than the sample sizes in existing GPS-based panel surveys. This research recommends a desired range of sample sizes based on the objectives and survey lengths of urban travel behavior studies.
|
149 |
Functional data mining with multiscale statistical proceduresLee, Kichun 01 July 2010 (has links)
Hurst exponent and variance are two quantities that often characterize real-life, highfrequency
observations. We develop the method for simultaneous estimation of a timechanging
Hurst exponent H(t) and constant scale (variance) parameter C in a multifractional
Brownian motion model in the presence of white noise based on the asymptotic behavior of
the local variation of its sample paths. We also discuss the accuracy of the stable and simultaneous
estimator compared with a few selected methods and the stability of computations
that use adapted wavelet filters.
Multifractals have become popular as flexible models in modeling real-life data of high
frequency. We developed a method of testing whether the data of high frequency is consistent
with monofractality using meaningful descriptors coming from a wavelet-generated multifractal
spectrum. We discuss theoretical properties of the descriptors, their computational
implementation, the use in data mining, and the effectiveness in the context of simulations,
an application in turbulence, and analysis of coding/noncoding regions in DNA sequences.
The wavelet thresholding is a simple and effective operation in wavelet domains that selects
the subset of wavelet coefficients from a noised signal. We propose the selection of this
subset in a semi-supervised fashion, in which a neighbor structure and classification function
appropriate for wavelet domains are utilized. The decision to include an unlabeled coefficient
in the model depends not only on its magnitude but also on the labeled and unlabeled
coefficients from its neighborhood. The theoretical properties of the method are discussed
and its performance is demonstrated on simulated examples.
|
150 |
Correlation-based Botnet Detection in Enterprise NetworksGu, Guofei 07 July 2008 (has links)
Most of the attacks and fraudulent activities on the Internet are carried out by malware. In particular, botnets, as state-of-the-art malware, are now considered as the largest threat to Internet security.
In this thesis, we focus on addressing the botnet detection problem in an enterprise-like network environment. We present a comprehensive correlation-based framework for multi-perspective botnet detection consisting of detection technologies demonstrated in four complementary systems: BotHunter, BotSniffer, BotMiner, and BotProbe. The common thread of these systems is correlation analysis, i.e., vertical correlation (dialog correlation), horizontal correlation, and cause-effect correlation. All these Bot* systems have been evaluated in live networks and/or real-world network traces. The evaluation results show that they can accurately detect real-world botnets for their desired detection purposes with a very low false positive rate.
We find that correlation analysis techniques are of particular value for detecting advanced malware such as botnets. Dialog correlation can be effective as long as malware infections need multiple stages. Horizontal correlation can be effective as long as malware tends to be distributed and coordinated. In addition, active techniques can greatly complement passive approaches, if carefully used. We believe our experience and lessons are of great benefit to future malware detection.
|
Page generated in 0.1456 seconds