101 |
Exploring Fit for Nonlinear Structural Equation ModelsPfleger, Phillip Isaac 01 April 2019 (has links)
Fit indices and fit measures commonly used to determine the accuracy and desirability of structural equation models are expected to be insensitive to nonlinearity in the data. This includes measures as ubiquitous as the CFI, TLI, RMSEA, SRMR, AIC, and BIC. Despite this, some software will report these measures when certain models are used. Consequently, some researchers may be led to use these fit measures without realizing the impropriety of the act. Alternative fit measures have been proposed, but these measures require further testing. As part of this thesis, a large simulation study was carried out to investigate alternative fit measures and to confirm whether the traditional measures are practically blind to nonlinearity in the data. The results of the simulation provide conclusive evidence that fit statistics and fit indices based on the chi-square distribution or the residual covariance matrix are entirely insensitive to nonlinearity. The posterior predictive p-value was also insensitive to nonlinearity. Only fit measures based on the structural residuals (i.e., HFI and R-squared) showed any sensitivity to nonlinearity. Of these, the R-squared was the only reliable measure of nonlinear model misspecification. This thesis shows that an effective strategy for determining whether a nonlinear model is preferable to a linear one involves using the R-squared to compare models that have been fit to the same data. An R-squared that is much larger for the nonlinear model than the linear model suggests that the linear model may be less desirable than the nonlinear model. The proposed method is intended to be supplementary to substantive theory. It is argued that any dependence on fit indices or fit statistics that places these measures on a higher pedestal than substantive theory will invariably lead to blindness on the part of the researcher. In other words, unwavering adherence to goodness-of-fit measures limits the researchers vision to what the measures themselves can detect.
|
102 |
Factors Related to Diabetes Mellitus among Asian-American Adults in the United States Using the 2011 to 2020 National Health and Nutrition Examination SurveyNichols, Quentin Zacharias 01 September 2023 (has links)
Type 2 diabetes mellitus (T2DM) disproportionality affects under-represented groups, specifically Asian Americans. Asian Americans are less likely to receive proper diabetes mellitus screening compared to other racial and ethnic groups, potentially due to improper screening guidelines by clinicians, and Asian Americans being unaware of their increased risk for diabetes mellitus. There are differences in the etiology of T2DM in Asian Americans compared to White Americans. Due to the increasing rates of T2DM among Asian Americans, new approaches in the screening of T2DM should be tailored based on race and ethnicity. The aging process is frequently associated with decreased muscle mass and increased adipose tissue, which can contribute to insulin resistance and lead to elevated hemoglobin A1c (HbA1c) percentages. Although sex has not been classified as an independent risk factor for T2DM, it is important to consider sex-specific conditions in the context of the disease. Body mass index (BMI) alone is insufficient to properly evaluate adiposity in Asian-American adults due to Asian Americans having a lower BMI with a higher body fat percentage. Waist circumference, waist-to-height ratio (WHtR), and visceral adiposity index (VAI) may be better for screening Asian Americans for T2DM. Multiple modifiable risk factors, such as sedentary behavior, and dietary intake (specifically dietary magnesium intake) can increase the risk for T2DM. Lack of physical activity can result in insulin resistance and impaired glucose metabolism as a result of muscle disuse and decreased lean body mass. Half of the Asian-American population is not consuming the recommended amounts of magnesium from foods, drinks, and dietary supplements. There is an inverse relationship between increased dietary magnesium intake and the risk of T2DM. In addition, the main language spoken in the household may influence lifestyle and risk of T2DM. The overarching goal of the present study was to establish which independent variables (age, sex, BMI, waist circumference, WHtR, VAI, sedentary behavior time, dietary magnesium intake, self-reported healthy diet status, and language) were the strongest predictors of HbA1c percentage (a measure of blood glucose control) in Asian-American adults using the National Health and Nutrition Examination Survey (NHANES) data from 2011 to 2020. The present study also evaluated the relationship among multiple predictors of HbA1c percentage, including age, sex, body composition, sedentary behavior time, dietary magnesium intake, self-reported healthy diet status, and language among Asian-American adults, 18 years of age and older, using the NHANES data from 2011 to 2020. / Doctor of Philosophy / Asian Americans have been disproportionately affected by type 2 diabetes mellitus (T2DM). Compared to other racial and ethnic groups, Asian Americans are less likely to receive proper diabetes mellitus screening. This may be due to inadequate screening guidelines and lack of awareness about their increased risk for diabetes mellitus. The cause of T2DM in Asian Americans differs from that in White Americans, which calls for tailored screening approaches based on race and ethnicity. The aging process is frequently associated with decreased muscle mass and increased adipose tissue, which can contribute to insulin resistance and lead to elevated hemoglobin A1c (HbA1c) percentages. Although sex has not been classified as an independent risk factor for T2DM, it is important to consider sex-specific conditions in the context of the disease. Body mass index (BMI) alone is not enough to accurately assess body fat in Asian-American adults, because they tend to have a lower BMI, but higher body fat percentage. Waist circumference, waist-to-height ratio (WHtR), and visceral adiposity index (VAI) might be more suitable for screening Asian Americans for T2DM. Several modifiable risk factors, such as a sedentary lifestyle and dietary intake (specifically, dietary magnesium intake), can increase the risk of T2DM. Lack of physical activity can lead to insulin resistance and impaired glucose metabolism due to muscle disuse and reduced lean body mass. Half of the Asian-American population does not consume the recommended amounts of magnesium from food, drinks, and dietary supplements. Researchers have shown that increased dietary magnesium intake is linked to a reduced risk of T2DM. In addition, the main language spoken in the household may influence lifestyle and risk of T2DM. The main goal of this study was to identify which factors (age, sex, BMI, waist circumference, WHtR, VAI, sedentary behavior time, dietary magnesium intake, self-reported healthy diet status, and language) were the strongest predictors of HbA1c percentage (a measure of blood glucose control) in Asian Americans. This was completed using the National Health and Nutrition Examination Survey (NHANES) data from 2011 to 2020. Additionally, the study aimed to establish the relationship among multiple predictors of HbA1c percentage, including age, sex, body composition, sedentary behavior time, dietary magnesium intake, self-reported healthy diet status, and language among Asian-American adults, 18 years of age and older, using the same NHANES data.
|
103 |
Exploration and Statistical Modeling of ProfitGibson, Caleb 01 December 2023 (has links) (PDF)
For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.
In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based upon general trends in the data, we recommend potential actions the company could take. Additionally, we examine how a company can utilize predictive modeling to help them adapt their decision-making process as the trends identified from the initial analysis of the data evolve over time.
|
104 |
STATISTICAL METHODS FOR VARIABLE SELECTION IN THE CONTEXT OF HIGH-DIMENSIONAL DATA: LASSO AND EXTENSIONSYang, Xiao Di 10 1900 (has links)
<p>With the advance of technology, the collection and storage of data has become routine. Huge amount of data are increasingly produced from biological experiments. the advent of DNA microarray technologies has enabled scientists to measure expressions of tens of thousands of genes simultaneously. Single nucleotide polymorphism (SNP) are being used in genetic association with a wide range of phenotypes, for example, complex diseases. These high-dimensional problems are becoming more and more common. The "large p, small n" problem, in which there are more variables than samples, currently a challenge that many statisticians face. The penalized variable selection method is an effective method to deal with "large p, small n" problem. In particular, The Lasso (least absolute selection and shrinkage operator) proposed by Tibshirani has become an effective method to deal with this type of problem. the Lasso works well for the covariates which can be treated individually. When the covariates are grouped, it does not work well. Elastic net, group lasso, group MCP and group bridge are extensions of the Lasso. Group lasso enforces sparsity at the group level, rather than at the level of the individual covariates. Group bridge, group MCP produces sparse solutions both at the group level and at the level of the individual covariates within a group. Our simulation study shows that the group lasso forces complete grouping, group MCP encourages grouping to a rather slight extent, and group bridge is somewhere in between. If one expects that the proportion of nonzero group members to be greater than one-half, group lasso maybe a good choice; otherwise group MCP would be preferred. If one expects this proportion to be close to one-half, one may wish to use group bridge. A real data analysis example is also conducted for genetic variation (SNPs) data to find out the associations between SNPs and West Nile disease.</p> / Master of Science (MSc)
|
105 |
Modelling Gender Disparities in Football : The Impact of Machine Learning Models Trained on Gender-Specific DataGådin, Douglas, Winman, Johan January 2024 (has links)
This thesis investigates the performance of machine learning models trained on gender-specific datasets in the context of football analytics, with a focus on the development of expected goals (xG), expected threat (xT), and expected passes (xP) models. Utilizing comprehensive match data from Sweden’s top football leagues, Allsvenskan (men’s) and Damallsvenskan (women’s), over five seasons (2018-2022), this research assesses the accuracy of these models in predicting match outcomes and player performance indicators when tailored to specific genders. By comparing the efficacy of models trained on male-only, femaleonly,and mixed-gender datasets, the study aims to determine the optimal data training approach for each model. The findings indicate that gender-specific training significantly enhances model performance, particularly in the context of women’s football, highlighting distinct gameplay dynamics and strategic implementations between genders. This thesis underscores the critical role of genderspecific analytical models in sports analytics, proposing that such tailored approaches can lead to more precise predictions and equitable analysis in sports, thereby supporting initiatives toward gender equality in athletic representation and research. / Detta examensarbete undersöker prestandan hos maskininlärningsmodeller tränade på könsspecifika dataset inom ramen för fotbollsanalyser, med fokus på utvecklingen av modeller för förväntade mål (xG), förväntat hot (xT) och förväntade passningar (xP). Genom att använda omfattande matchdata från Sveriges högsta fotbollsligor, Allsvenskan för herrar och Damallsvenskan för damer, under fem säsonger (2018–2022), bedöms noggrannheten i dessa modeller för att förutspå matchresultat och spelarprestationsindikatorer anpassade för de specifika könen. Studien jämför effektiviteten hos modeller tränade på dataset enbart för män, enbart för kvinnor och de båda könen, för att bestämma det mest optimala tillvägagångssättet för dataträning för varje modell. Resultaten visar att könsspecifik träning markant förbättrar modellernas prestanda, särskilt när det gäller damfotboll, och belyser tydliga skillnader i speldynamik och strategiska tillämpningar mellan könen. Denna avhandling betonar den kritiska rollen som könsspecifika analytiska modeller har inom sportanalys och föreslår att sådana skräddarsydda tillvägagångssätt kan leda till mer precisa förutsägelser och rättvisa analyser inom idrotten, vilket i sin tur stödjer initiativ för jämställdhet inom idrottsrepresentation och forskning.
|
106 |
Juvenile River Herring in Freshwater Lakes: Sampling Approaches for Evaluating Growth and SurvivalDevine, Matthew T 27 October 2017 (has links) (PDF)
River herring, collectively alewives (Alosa pseudoharengus) and blueback herring (A. aestivalis), have experienced substantial population declines over the past five decades due in large part to overfishing, combined with other sources of mortality, and disrupted access to critical freshwater spawning habitats. Anadromous river herring populations are currently assessed by counting adults in rivers during upstream spawning migrations, but no field-based assessment methods exist for estimating juvenile densities in freshwater nursery habitats. Counts of 4-year-old migrating adults are variable and prevent understanding about how mortality acts on different life stages prior to returning to spawn (e.g., juveniles and immature adults in lakes, rivers, estuaries, and oceans). This in turn makes it challenging to infer a link between adult counts and juvenile recruitment and to develop effective management policy. I used a pelagic purse seine to investigate juvenile river herring densities, growth, and mortality across 16 New England lakes. First, I evaluated the effectiveness and sampling precision of a pelagic purse seine for capturing juvenile river herring in lakes, since this sampling gear has not been systematically tested. Sampling at night in June or July resulted in highest catches. Precision, as measured by the coefficient of variation, was lowest in July (0.23) compared to June (0.32), August (0.38), and September (0.61). Simulation results indicated that the effort required to produce precise density estimates is largely dependent on lake size with small lakes (<50 >ha) requiring up to 10 purse seine hauls and large lakes (>50 ha) requiring 15–20 hauls. These results suggested that juvenile recruitment densities can be effectively measured using a purse seine at night in June or July with 10–20 hauls. Using juvenile fishes captured during purse seining in June–September 2015, I calculated growth and mortality rates from sagittal otoliths. Density, growth, and mortality were highly variable among lakes, and mixed-effects regression models explained 11%–76% of the variance in these estimates. Juvenile densities ranged over an order of magnitude and were inversely related to dissolved organic carbon. Juvenile growth rates were higher in productive systems (i.e., low secchi depth, high nutrients) and were strongly density-dependent, leading to much larger fish at age in productive lakes with low densities of river herring compared to high density lakes. Water temperature explained 56%–85% of the variation in juvenile growth rates during the first 30 days of life. Mortality was positively related to total phosphorous levels and inversely related to hatch date, with earlier hatching cohorts experiencing higher mortality. These results indicate the importance of water quality and juvenile densities in nursery habitats for determining juvenile growth and survival. This study encourages future assessments of juvenile river herring in freshwater and contributes to an understanding of factors explaining juvenile recruitment that can guide more effective and comprehensive management of river herring.
|
107 |
Landslide Susceptibility and Tree Ring Eccentricity Analysis Along Unstable Slopes of the New River Watershed, Anderson and Morgan Counties, TNPalmer, Megan 01 May 2024 (has links) (PDF)
Landslides are mass movements that affect infrastructure across East Tennessee, causing problems for the Tennessee Department of Transportation (TDOT). An assessment of conditions and locations of unstable slopes can aid TDOT in infrastructure management. Landslide susceptibility was evaluated for Anderson and Morgan counties, TN, off State Route 116 in the New River watershed. Susceptibility maps used a landslide inventory and six factors: elevation, slope, geology, distance from stream, rainfall, and curvature, input in forest-based classification and logistic regression models. Additionally, affected trees along these unstable slopes in Anderson and Morgan counties were cored to analyze mass movement impacts on tree rings. This research demonstrates the importance of causative factors used to model landslide susceptible areas and how trees rings can carry the signature of landslides. These two studies can help aid in mitigation practices for TDOT and potentially apply landslide susceptibility research to other parts of East Tennessee.
|
108 |
Prediction of project yield and project success in the construction sector using statistical modelsWolf-Watz, Max, Zakrisson, Benjamin January 2024 (has links)
The construction sector is embossed with uncertainty, where cash flow prediction, time delays, and complex feature interaction make it hard to predict which future projects will be profitable or not. This thesis explores the prediction of project yield and project success for a company in the construction industry using supervised learning models. Leveraging historical project data, parametric traditional regression and machine learning techniques are employed to develop predictive models for project yield and project success. The models were chosen based on previously related work and consultations with employees with domain knowledge in the industry. The study aims to identify the most effective modeling approach for yield prediction and success in construction projects through comprehensive analysis and comparison. The features influencing project yield are investigated using SHAP (SHapley Additive exPla-nations) and permutation feature importance (PFI) values. These explainability techniquesprovide insights into feature importance in the models, thereby enhancing the understandingof the underlying factors driving project yield and project success. The results of this research contribute to the advancement of predictive modeling in the construction industry, offering valuable insights for project planning and decision-making. Construction companies can optimize resource allocation, mitigate risks, and improve projectoutcomes by accurately predicting project yield and success and understanding the keyfactors influencing it. The results in this thesis reveal that the machine-learning models outperform the parametric models overall. The best-performing models, based primarily on accuracy and ROI, were the random forest models with both binary and continuous outputs, leading to a suggested data-driven guideline for the company to use in their project decision-making process.
|
109 |
Some mixture models for the joint distribution of stock's return and trading volumeWong, Po-shing., 黃寶誠. January 1991 (has links)
published_or_final_version / Statistics / Master / Master of Philosophy
|
110 |
Flipping Biological Switches: Solving for Optimal Control: A DissertationChang, Joshua TsuKang 30 March 2015 (has links)
Switches play an important regulatory role at all levels of biology, from molecular switches triggering signaling cascades to cellular switches regulating cell maturation and apoptosis. Medical therapies are often designed to toggle a system from one state to another, achieving a specified health outcome. For instance, small doses of subpathologic viruses activate the immune system’s production of antibodies. Electrical stimulation revert cardiac arrhythmias back to normal sinus rhythm. In all of these examples, a major challenge is finding the optimal stimulus waveform necessary to cause the switch to flip. This thesis develops, validates, and applies a novel model-independent stochastic algorithm, the Extrema Distortion Algorithm (EDA), towards finding the optimal stimulus. We validate the EDA’s performance for the Hodgkin-Huxley model (an empirically validated ionic model of neuronal excitability), the FitzHugh-Nagumo model (an abstract model applied to a wide range of biological systems that that exhibit an oscillatory state and a quiescent state), and the genetic toggle switch (a model of bistable gene expression). We show that the EDA is able to not only find the optimal solution, but also in some cases excel beyond the traditional analytic approaches. Finally, we have computed novel optimal stimulus waveforms for aborting epileptic seizures using the EDA in cellular and network models of epilepsy. This work represents a first step in developing a new class of adaptive algorithms and devices that flip biological switches, revealing basic mechanistic insights and therapeutic applications for a broad range of disorders.
|
Page generated in 0.1086 seconds