121 |
Determinantes do consumo de frutas, legumes e verduras em adultos residentes no município de São Paulo / Determinants of fruit and vegetable intake in adults living in São Paulo City.Figueiredo, Iramaia Campos Ribeiro 14 August 2006 (has links)
Introdução: A incidência de doenças crônicas não transmissíveis (DCNT) vem aumentando em todo o mundo. Estudos comprovam que o consumo de frutas, legumes e verduras (FLV) reduzem a incidência de DCNT na população Objetivo: Avaliar os determinantes do consumo de FLV em adultos residentes no município de São Paulo Metodologia: É um estudo transversal, abrangendo 1267 mulheres e 855 homens, com idade igual ou superior a 18 anos. A análise de regressão linear foi baseada no modelo hierárquico de fatores associados ao consumo de FLV. As variáveis foram agrupadas em categorias hierárquicas, abrangendo dos fatores distais aos proximais. Essas categorias foram, nessa ordem, sócio-demográfica, comportamental e de consumo alimentar. Resultados: Para ambos os sexos, verificou-se que as seguintes variáveis estavam diretamente associadas ao consumo de FLV: idade e anos de estudo, na categoria sócio-demográfica; prática de atividade física no lazer e ter feito dieta no último ano, na categoria comportamental e consumo de peixe na categoria de consumo alimentar. A densidade domiciliar mostrou-se inversamente associada ao consumo de FLV em ambos os sexos. Somente para as mulheres ser ou já ter sido casada foi diretamente associado ao consumo de FLV e ser fumante mostrou-se inversamente associado. O consumo de alimentos que indicam um padrão de consumo não saudável como açúcares e carne vermelha com gordura mostrou-se inversamente associado ao consumo de FLV em ambos os sexos. Conclusão: O consumo de FLV e seus determinantes são diferentes para homens e mulheres, sendo a maior freqüência de consumo ocorre no sexo feminino. / The burden of noncommunicable diseases (NCD) increases in the whole world. Studies prove that consumption do fruits and vegetables (FV) reduce the incidence of NTCD in the population. The present study aims to evaluate by telephones interviews the determinants of Fruit and vegetable's intake in adults living in São Paulo City. This is a cross-sectional study, ranging over 1267 women and 855 men, aged 18 years old or more. Multiple linear regression analysis was based on a hierarchical model of factors associated with FV intake. The variables were grouped into a hierarchy of categories, ranging from distal determinants to proximate ones. These categories included, in this order, socio-demographic, behavioral and food consumption. For both gender, we found that the following variables were directly associated with FV intake: age and years of study, in the socio-demographic category, physical activity in leisure time and have been on a diet in the last year, in the behavioral category and fish consumption in the nutritional category. The domiciliary density was inversely associated with FLV consumption for both genders. Only for women marital status was directly associated with FV intake and tobacco use were inversely associated. The dietary intake of food that indicates an unhealthy diet, like sugar and read meat with fat were inversely associated with FV intake for both gender. Consumption of FV and their determinants are different for man and women and the major consumption occurs with women.
|
122 |
Are Highly Dispersed Variables More Extreme? The Case of Distributions with Compact SupportAdjogah, Benedict E 01 May 2014 (has links)
We consider discrete and continuous symmetric random variables X taking values in [0; 1], and thus having expected value 1/2. The main thrust of this investigation is to study the correlation between the variance, Var(X) of X and the value of the expected maximum E(Mn) = E(X1,...,Xn) of n independent and identically distributed random variables X1,X2,...,Xn, each distributed as X. Many special cases are studied, some leading to very interesting alternating sums, and some progress is made towards a general theory.
|
123 |
Multilevel Models for Longitudinal DataKhatiwada, Aastha 01 August 2016 (has links)
Longitudinal data arise when individuals are measured several times during an ob- servation period and thus the data for each individual are not independent. There are several ways of analyzing longitudinal data when different treatments are com- pared. Multilevel models are used to analyze data that are clustered in some way. In this work, multilevel models are used to analyze longitudinal data from a case study. Results from other more commonly used methods are compared to multilevel models. Also, comparison in output between two software, SAS and R, is done. Finally a method consisting of fitting individual models for each individual and then doing ANOVA type analysis on the estimated parameters of the individual models is proposed and its power for different sample sizes and effect sizes is studied by simulation.
|
124 |
THE FAMILY OF CONDITIONAL PENALIZED METHODS WITH THEIR APPLICATION IN SUFFICIENT VARIABLE SELECTIONXie, Jin 01 January 2018 (has links)
When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable Screening (CS-SCAL-VS) algorithms based on CAL. The asymptotic and oracle properties are proved. Simulations, especially for the large p small n problems, are performed with comparisons with other existing methods. We further extend to the linear model setup to the generalized linear models (GLM). Instead of least squares, we consider the likelihood function with L1 penalty, that is the penalized likelihood methods. We proposed for Generalized Conditional Adaptive Lasso (GCAL) for the generalized linear models. We then further extend the method for any penalty terms that satisfy certain regularity conditions, namely Conditionally Penalized Estimate (CPE). Asymptotic and oracle properties are showed. Four corresponding sufficient variable screening algorithms are proposed. Simulation examples are evaluated for our method with comparisons with existing methods. GCAL is also evaluated with a read data set on leukemia.
|
125 |
Automatic <sup>13</sup>C Chemical Shift Reference Correction of Protein NMR Spectral Data Using Data Mining and Bayesian Statistical ModelingChen, Xi 01 January 2019 (has links)
Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially of biomacromolecules such as proteins. However, due to the intrinsic properties of NMR experiments, results from the NMR instruments require a refencing step before the down-the-line analysis. Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamentally limits and even prevents effective study of biomacromolecules via NMR. There is no available method that can rereference carbon chemical shifts from protein NMR without secondary experimental information such as structure or resonance assignment.
To solve this problem, we constructed a Bayesian probabilistic framework that circumvents the limitations of previous reference correction methods that required protein resonance assignment and/or three-dimensional protein structure. Our algorithm named Bayesian Model Optimized Reference Correction (BaMORC) can detect and correct 13C chemical shift referencing errors before the protein resonance assignment step of analysis and without a three-dimensional structure. By combining the BaMORC methodology with a new intra-peaklist grouping algorithm, we created a combined method called Unassigned BaMORC that utilizes only unassigned experimental peak lists and the amino acid sequence.
Unassigned BaMORC kept all experimental three-dimensional HN(CO)CACB-type peak lists tested within ± 0.4 ppm of the correct 13C reference value. On a much larger unassigned chemical shift test set, the base method kept 13C chemical shift referencing errors to within ± 0.45 ppm at a 90% confidence interval. With chemical shift assignments, Assigned BaMORC can detect and correct 13C chemical shift referencing errors to within ± 0.22 at a 90% confidence interval. Therefore, Unassigned BaMORC can correct 13C chemical shift referencing errors when it will have the most impact, right before protein resonance assignment and other downstream analyses are started. After assignment, chemical shift reference correction can be further refined with Assigned BaMORC.
To further support a broader usage of these new methods, we also created a software package with web-based interface for the NMR community. This software will allow non-NMR experts to detect and correct 13C referencing errors at critical early data analysis steps, lowering the bar of NMR expertise required for effective protein NMR analysis.
|
126 |
EFFECT OF SOCIOECONOMIC AND DEMOGRAPHIC FACTORS ON KENTUCKY CRASHESCambron, Aaron Berry 01 January 2018 (has links)
The goal of this research was to examine the potential predictive ability of socioeconomic and demographic data for drivers on Kentucky crash occurrence. Identifying unique background characteristics of at-fault drivers that contribute to crash rates and crash severity may lead to improved and more specific interventions to reduce the negative impacts of motor vehicle crashes. The driver-residence zip code was used as a spatial unit to connect five years of Kentucky crash data with socioeconomic factors from the U.S. Census, such as income, employment, education, age, and others, along with terrain and vehicle age. At-fault driver crash counts, normalized over the driving population, were used as the dependent variable in a multivariate linear regression to model socioeconomic variables and their relationship with motor vehicle crashes. The final model consisted of nine socioeconomic and demographic variables and resulted in a R-square of 0.279, which indicates linear correlation but a lack of strong predicting power. The model resulted in both positive and negative correlations of socioeconomic variables with crash rates. Positive associations were found with the terrain index (a composite measure of road curviness), travel time, high school graduation and vehicle age. Negative associations were found with younger drivers, unemployment, college education, and terrain difference, which considers the terrain index at the driver residence and crash location. Further research seems to be warranted to fully understand the role that socioeconomic and demographic characteristics play in driving behavior and crash risk.
|
127 |
Lifetime value modelling / Frederick Jacques van der WesthuizenVan der Westhuizen, Frederick Jacques January 2009 (has links)
Given the increase in popularity of Lifetime Value (LTV), the argument is that the topic will assume an increasingly central role in research and marketing. As such, the decision to assess the state of the field in Lifetime Value Modelling, and outline challenges unique to choice researchers in customer relationship management (CRM). As the research has argued, there are an excess of issues and analytical challenges that remain unresolved. The researcher hopes that this thesis inspires new answers and new approaches to resolve LTV. The scope of this project covers the building of a LTV model through multiple regression. This thesis is exclusively focused on modelling tenure. In this regard, there are a variety of benchmark statistical techniques arising from survival analysis, which could be applied, to tenure modelling. Tenure prediction will be looked at using survival analysis and compared with "crossbreed" data mining techniques that use multiple regression in concurrence with statistical techniques. It will be demonstrated how data mining tools complement the statistical models, and show that their mutual usage overcomes many of the shortcomings of each singular tool set, resulting in LTV models that are both accurate and comprehensible. Bank XYZ is used as an example and is based on a real scenario of one of the Banks of South Africa. / Thesis (M.Sc. (Computer Science))--North-West University, Vaal Triangle Campus, 2009.
|
128 |
Bayesian Logistic Regression Model for Siting Biomass-using FacilitiesHuang, Xia 01 December 2010 (has links)
Key sources of oil for western markets are located in complex geopolitical environments that increase economic and social risk. The amalgamation of economic, environmental, social and national security concerns for petroleum-based economies have created a renewed emphasis on alternative sources of energy which include biomass. The stability of sustainable biomass markets hinges on improved methods to predict and visualize business risk and cost to the supply chain.
This thesis develops Bayesian logistic regression models, with comparisons of classical maximum likelihood models, to quantify significant factors that influence the siting of biomass-using facilities and predict potential locations in the 13-state Southeastern United States for three types of biomass-using facilities. Group I combines all biomass-using mills, biorefineries using agricultural residues and wood-using bioenergy/biofuels plants. Group II included pulp and paper mills, and biorefineries that use agricultural and wood residues. Group III included food processing mills and biorefineries that use agricultural and wood residues. The resolution of this research is the 5-digit ZIP Code Tabulation Area (ZCTA), and there are 9,416 ZCTAs in the 13-state Southeastern study region.
For both classical and Bayesian approaches, a training set of data was used plus a separate validation (hold out) set of data using a pseudo-random number-generating function in SAS® Enterprise Miner. Four predefined priors are constructed. Bayesian estimation assuming a Gaussian prior distribution provides the highest correct classification rate of 86.40% for Group I; Bayesian methods assuming the non-informative uniform prior has the highest correct classification rate of 95.97% for Group II; and Bayesian methods assuming a Gaussian prior gives the highest correct classification rate of 92.67% for Group III. Given the comparative low sensitivity for Group II and Group III, a hybrid model that integrates classification trees and local Bayesian logistic regression was developed as part of this research to further improve the predictive power. The hybrid model increases the sensitivity of Group II from 58.54% to 64.40%, and improves both of the specificity and sensitivity significantly for Group III from 98.69% to 99.42% and 39.35% to 46.45%, respectively. Twenty-five optimal locations for the biomass-using facility groupings at the 5-digit ZCTA resolution, based upon the best fitted Bayesian logistic regression model and the hybrid model, are predicted and plotted for the 13-state Southeastern study region.
|
129 |
A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood CompositesZeng, Yan 01 August 2011 (has links)
Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers.
Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection.
Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models of non-imputed datasets (average NRMSEP of 6.3% for model of MOR and 8.1% for model of IB). The second part finds that Bayesian Additive Regression Tree (BART) produced most precise prediction results (average NRMSEP of 7.7% for MOR model and 8.6% for IB model) than other three models: PLSR, LASSO, and Adaptive LASSO.
|
130 |
Generalized Bathtub Hazard Models for Binary-Transformed Climate DataPolcer, James 01 May 2011 (has links)
In this study, we use a hazard-based modeling as an alternative statistical framework to time series methods as applied to climate data. Data collected from the Kentucky Mesonet will be used to study the distributional properties of the duration of high and low-energy wind events relative to an arbitrary threshold. Our objectiveswere to fit bathtub models proposed in literature, propose a generalized bathtub model, apply these models to Kentucky Mesonet data, and make recommendations as to feasibility of wind power generation. Using two different thresholds (1.8 and 10 mph respectively), results show that the Hjorth bathtub model consistently performed better than all other models considered with coefficient of R-squared values at 0.95 or higher. However, fewer sites and months could be included in the analysis when we increased our threshold to 10 mph. Based on a 10 mph threshold, Bowling Green (FARM), Hopkinsville (PGHL), and Columbia (CMBA) posted the top 3 wind duration times in February of 2009. Further studies needed to establish long-term trends.
|
Page generated in 0.0988 seconds