1 |
Predicting Alzheimer Disease Status Using High-Dimensional MRI Data Based on LASSO Constrained Generalized Linear ModelsSalah, Zainab 08 August 2017 (has links)
Introduction: Alzheimer’s disease is an irreversible brain disorder characterized by distortion of memory and other mental functions. Although, several psychometric tests are available for diagnosis of Alzheimer’s, there is a great concern about the validity of these tests at recognizing the early onset of the disease. Currently, brain magnetic resonance imaging is not commonly utilized in the diagnosis of Alzheimer’s, because researchers are still puzzled by the association of brain regions with the disease status and its progress. Moreover, MRI data tend to be of high dimensional nature requiring advanced statistical methods to accurately analyze them. In the past decade, the application of Least Absolute Shrinkage and Selection Operator (LASSO) has become increasingly popular in the analysis of high dimensional data. With LASSO, only a small number of the regression coefficients are believed to have a non-zero value, and therefore allowed to enter the model; other coefficients are while others are shrunk to zero.
Aim: Determine the non-zero regression coefficients in models predicting patients’ classification (Normal, mild cognitive impairment (MCI), or Alzheimer’s) using both non-ordinal and ordinal LASSO.
Methods: Pre-processed high dimensional MRI data of the Alzheimer’s Disease Neuroimaging Initiative was analyzed. Predictors of the following model were differentiated: Alzheimer’s vs. normal, Alzheimer’s vs. normal and MCI, Alzheimer’s and MCI vs. Normal. Cross-validation followed by ordinal LASSO was executed on these same sets of models.
Results: Results were inconclusive. Two brain regions, frontal lobe and putamen, appeared more frequently in the models than any other region. Non-ordinal multinomial models performed better than ordinal multinomial models with higher accuracy, sensitivity, and specificity rates. It was determined that majority of the models were best suited to predict MCI status than the other two statues.
Discussion: In future research, the other stages of the disease, different statistical analysis methods, such as elastic net, and larger samples sizes should be explored when using brain MRI for Alzheimer’s disease classification.
|
2 |
Discrete Weibull regression model for count dataKalktawi, Hadeel Saleh January 2017 (has links)
Data can be collected in the form of counts in many situations. In other words, the number of deaths from an accident, the number of days until a machine stops working or the number of annual visitors to a city may all be considered as interesting variables for study. This study is motivated by two facts; first, the vital role of the continuous Weibull distribution in survival analyses and failure time studies. Hence, the discrete Weibull (DW) is introduced analogously to the continuous Weibull distribution, (see, Nakagawa and Osaki (1975) and Kulasekera (1994)). Second, researchers usually focus on modeling count data, which take only non-negative integer values as a function of other variables. Therefore, the DW, introduced by Nakagawa and Osaki (1975), is considered to investigate the relationship between count data and a set of covariates. Particularly, this DW is generalised by allowing one of its parameters to be a function of covariates. Although the Poisson regression can be considered as the most common model for count data, it is constrained by its equi-dispersion (the assumption of equal mean and variance). Thus, the negative binomial (NB) regression has become the most widely used method for count data regression. However, even though the NB can be suitable for the over-dispersion cases, it cannot be considered as the best choice for modeling the under-dispersed data. Hence, it is required to have some models that deal with the problem of under-dispersion, such as the generalized Poisson regression model (Efron (1986) and Famoye (1993)) and COM-Poisson regression (Sellers and Shmueli (2010) and Sáez-Castillo and Conde-Sánchez (2013)). Generally, all of these models can be considered as modifications and developments of Poisson models. However, this thesis develops a model based on a simple distribution with no modification. Thus, if the data are not following the dispersion system of Poisson or NB, the true structure generating this data should be detected. Applying a model that has the ability to handle different dispersions would be of great interest. Thus, in this study, the DW regression model is introduced. Besides the exibility of the DW to model under- and over-dispersion, it is a good model for inhomogeneous and highly skewed data, such as those with excessive zero counts, which are more disperse than Poisson. Although these data can be fitted well using some developed models, namely, the zero-inated and hurdle models, the DW demonstrates a good fit and has less complexity than these modifed models. However, there could be some cases when a special model that separates the probability of zeros from that of the other positive counts must be applied. Then, to cope with the problem of too many observed zeros, two modifications of the DW regression are developed, namely, zero-inated discrete Weibull (ZIDW) and hurdle discrete Weibull (HDW) models. Furthermore, this thesis considers another type of data, where the response count variable is censored from the right, which is observed in many experiments. Applying the standard models for these types of data without considering the censoring may yield misleading results. Thus, the censored discrete Weibull (CDW) model is employed for this case. On the other hand, this thesis introduces the median discrete Weibull (MDW) regression model for investigating the effect of covariates on the count response through the median which are more appropriate for the skewed nature of count data. In other words, the likelihood of the DW model is re-parameterized to explain the effect of the predictors directly on the median. Thus, in comparison with the generalized linear models (GLMs), MDW and GLMs both investigate the relations to a set of covariates via certain location measurements; however, GLMs consider the means, which is not the best way to represent skewed data. These DW regression models are investigated through simulation studies to illustrate their performance. In addition, they are applied to some real data sets and compared with the related count models, mainly Poisson and NB models. Overall, the DW models provide a good fit to the count data as an alternative to the NB models in the over-dispersion case and are much better fitting than the Poisson models. Additionally, contrary to the NB model, the DW can be applied for the under-dispersion case.
|
3 |
Point process modeling as a framework to dissociate intrinsic and extrinsic components in neural systemsFiddyment, Grant Michael 03 November 2016 (has links)
Understanding the factors shaping neuronal spiking is a central problem in neuroscience. Neurons may have complicated sensitivity and, often, are embedded in dynamic networks whose ongoing activity may influence their likelihood of spiking. One approach to characterizing neuronal spiking is the point process generalized linear model (GLM), which decomposes spike probability into explicit factors. This model represents a higher level of abstraction than biophysical models, such as Hodgkin-Huxley, but benefits from principled approaches for estimation and validation.
Here we address how to infer factors affecting neuronal spiking in different types of neural systems. We first extend the point process GLM, most commonly used to analyze single neurons, to model population-level voltage discharges recorded during human seizures. Both GLMs and descriptive measures reveal rhythmic bursting and directional wave propagation. However, we show that GLM estimates account for covariance between these features in a way that pairwise measures do not. Failure to account for this covariance leads to confounded results. We interpret the GLM results to speculate the mechanisms of seizure and suggest new therapies.
The second chapter highlights flexibility of the GLM. We use this single framework to analyze enhancement, a statistical phenomenon, in three distinct systems. Here we define the enhancement score, a simple measure of shared information between spike factors in a GLM. We demonstrate how to estimate the score, including confidence intervals, using simulated data. In real data, we find that enhancement occurs prominently during human seizure, while redundancy tends to occur in mouse auditory networks. We discuss implications for physiology, particularly during seizure.
In the third part of this thesis, we apply point process modeling to spike trains recorded from single units in vitro under external stimulation. We re-parameterize models in a low-dimensional and physically interpretable way; namely, we represent their effects in principal component space. We show that this approach successfully separates the neurons observed in vitro into different classes consistent with their gene expression profiles.
Taken together, this work contributes a statistical framework for analyzing neuronal spike trains and demonstrates how it can be applied to create new insights into clinical and experimental data sets.
|
4 |
Criteria for generalized linear model selection based on Kullback's symmetric divergenceAcion, Cristina Laura 01 December 2011 (has links)
Model selection criteria frequently arise from constructing estimators of discrepancy measures used to assess the disparity between the data generating model and a fitted approximating model. The widely known Akaike information criterion (AIC) results from utilizing Kullback's directed divergence (KDD) as the targeted discrepancy. Under appropriate conditions, AIC serves as an asymptotically unbiased estimator of KDD. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternate directed divergence may be obtained by reversing the roles of the two models in the definition of the measure. The sum of the two directed divergences is Kullback's symmetric divergence (KSD).
A comparison of the two directed divergences indicates an important distinction between the measures. When used to evaluate fitted approximating models that are improperly specified, the directed divergence which serves as the basis for AIC is more sensitive towards detecting overfitted models, whereas its counterpart is more sensitive towards detecting underfitted models. Since KSD combines the information in both measures, it functions as a gauge of model disparity which is arguably more balanced than either of its individual components. With this motivation, we propose three estimators of KSD for use as model selection criteria in the setting of generalized linear models: KICo, KICu, and QKIC. These statistics function as asymptotically unbiased estimators of KSD under different assumptions and frameworks.
As with AIC, KICo and KICu are both justified for large-sample maximum likelihood settings; however, asymptotic unbiasedness holds under more general assumptions for KICo and KICu than for AIC. KICo serves as an asymptotically unbiased estimator of KSD in settings where the distribution of the response is misspecified. The asymptotic unbiasedness of KICu holds when the candidate model set includes underfitted models. QKIC is a modification of KICo. In the development of QKIC, the likelihood is replaced by the quasi-likelihood. QKIC can be used as a model selection tool when generalized estimating equations, a quasi-likelihood-based method, are used for parameter estimation. We examine the performance of KICo, KICu, and QKIC relative to other relevant criteria in simulation experiments. We also apply QKIC in a model selection problem for a randomized clinical trial investigating the effect of antidepressants on the temporal course of disability after stroke.
|
5 |
Lasso for Autoregressive and Moving Average Coeffients via Residuals of Unobservable Time SeriesHanh , Nguyen T. January 2018 (has links)
No description available.
|
6 |
Spatial and temporal population dynamics of yellow perch (Perca flavescens) in Lake ErieYu, Hao 19 August 2010 (has links)
Yellow perch (Perca flavescens) in Lake Erie support valuable commercial and recreational fisheries critical to the local economy and society. The study of yellow perch's temporal and spatial population dynamics is important for both stock assessment and fisheries management. I explore the spatial and temporal variation of the yellow perch population by analyzing the fishery-independent surveys in Lake Erie. Model-based approaches were developed to estimate the relative abundance index, which reflected the temporal variation of the population. I also used design-based approaches to deal with the situation in which population density varied both spatially and temporally.
I first used model-based approaches to explore the spatial and temporal variation of the yellow perch population and to develop the relative abundance index needed. Generalized linear models (GLM), spatial generalized linear models (s-GLM), and generalized additive models (GAM) were compared by examining the goodness-of-fit, reduction of spatial autocorrelation, and prediction errors from cross-validation. The relationship between yellow perch density distribution and spatial and environmental factors was also studied. I found that GAM showed the best goodness-of-fit shown as AIC and lowest prediction errors but s-GLM resulted in the best reduction of spatial autocorrelation. Both performed better than GLM for yellow perch relative abundance index estimation. I then applied design-based approaches to study the spatial and temporal population dynamics of yellow perch through both practical data analysis and simulation. The currently used approach in Lake Erie is stratified random sampling (StRS). Traditional sampling designs (simple random sampling (SRS) and StRS) and adaptive sampling designs (adaptive two-phase sampling (ATS), adaptive cluster sampling (ACS), and adaptive two-stage sequential sampling (ATSS)) for fishery-independent surveys were compared. From accuracy and precision aspect, ATS performed better than the SRS, StRS, ACS and ATSS for yellow perch fishery-independent survey data in Lake Erie. Model-based approaches were further studied by including geostatistical models. The performance of the GLM and GAM models and geostatistical models (spatial interpolation) were compared when they are used to analyze the temporal and spatial variation of the yellow perch population through a simulation study. This is the first time that these two types of model- based approaches have been compared in fisheries. I found that arithmetic mean (AM) method was only preferred when neither environment factors nor spatial information of sampling locations were available. If the survey can not cover the distribution area of the population due to biased design or lack of sampling locations, GLMs and GAMs are preferable to spatial interpolation (SI). Otherwise, SI is a good alternative model to estimate relative abundance index. SI has rarely been realized in fisheries.
Different models may be recommended for different species/fisheries when we estimate their spatial-temporal dynamics, and also the most appropriate survey designs may be different for different species. However, the criteria and approaches for the comparison of both model-based and design-based approaches will be applied for different species or fisheries. / Ph. D.
|
7 |
Robust Diagnostics for the Logistic Regression Model With Incomplete Data范少華 Unknown Date (has links)
Atkinson 及 Riani 應用前進搜尋演算法來處理百牡利資料中所包含的多重離群值(2001)。在這篇論文中,我們沿用相同的想法來處理在不完整資料下一般線性模型中的多重離群值。這個演算法藉由先填補資料中遺漏的部分,再利用前進搜尋演算法來確認資料中的離群值。我們所提出的方法可以解決處理多重離群值時常會遇到的遮蓋效應。我們應用了一些真實資料來說明這個演算法並得到令人滿意結果。 / Atkinson and Riani (2001) apply the forward search algorithm to deal with the problem of the detection of multiple outliers in binomial data.
In this thesis, we extend the similar idea to identify multiple outliers for the generalized linear models when part of data are missing. The algorithm starts with imputation method to
fill-in the missing observations in the data, and then use the forward search algorithm to confirm outliers. The proposed method can overcome the masking effect, which commonly occurs when multiple outliers exit in the data. Real data are used to illustrate the procedure, and satisfactory results are obtained.
|
8 |
Betydelse av lövinslag, död ved och variation i träddiameter för artrikedomen hos småfåglar / Importance of deciduous trees, dead wood and variation in tree diameter for species richness in birdsForssén, Annika January 2011 (has links)
Forest management contributes to the changes in forest structure by turning heterogenous forests of varied age into homogenous forests of similar age and thus affect bird species depending on different structures or habitats which are lost during forestry. In this report, a study was made to investigate how the amount of decidious trees, dead wood and variation in tree diameter affect bird diversity. The purpose of this study was to be able to give forest management guidelines to increase bird diversity. This study was conducted by investigating 65 transects in forests of different structure south of Linköping, Sweden. Along the 65 transects, birds were inventoried as well as the vegetation. The trees were measured in 5 circles along each transect. The data from the investigations both on birds and vegetation were analysed by using generalized linear models. The results showed that amount of deadwood and variation in tree diameter had the strongest effects on bird diversity, and to some extent the amount of decidious trees. By applying this knowledge of the positivt effects on birds when increasning the amount of deadwood, decidious trees and variation in tree diameter in the forests, it is possible to create better conditions for maintaining species richness and diversity.
|
9 |
SINGLE UNIT AND ENSEMBLE RESPONSE PROPERTIES OF THE GUSTATORY CORTEX IN THE AWAKE RATStapleton, Jennifer Rebecca 10 August 2007 (has links)
Most studies of gustatory coding have been performed in either anesthetized or awake, passively stimulated rats. In this dissertation the influences of behavioral state on gustatory processing in awake rats are described. In the first set of experiments, the effects of non-contingent tastant delivery on the chemical tuning of single neurons were explored. Tastants were delivered non-contingently through intra-oral cannulas to restrained, non water-deprived rats while single unit responses were recorded from the gustatory cortex (GC). As the subjects' behavior progressed from acceptance to rejection of the tastants, the chemical tuning of the neurons changed as well. This suggests that the subjects' behavioral state powerfully influences gustatory processing. In the second set of experiments, rats were trained to lick for fluid reinforcement on an FR5 schedule while single unit activity was recorded from GC. In this case, the chemical tuning was much more stable. Under this paradigm, chemosensory responses were rapid (~ 150 ms) and broadly tuned. In the third study, it was found that ensembles of GC neurons could discriminate between tastants and their concentrations on a single trial basis, and such discrimination was accomplished with a combination of rate and temporal coding. Ensembles of GC neurons also anticipated the identity of the upcoming stimulus when the tastant delivery was predictable. Finally, it was found that ensembles of GC neurons could discriminate between the bitter stimuli nicotine and quinine. Nicotine is both a bitter tastant and a trigeminal stimulant, and when the acetylcholine receptors in the lingual epithelium were blocked with mecamylamine, the ensembles failed to discriminate nicotine from quinine.
|
10 |
Bayesian classification and survival analysis with curve predictorsWang, Xiaohui 15 May 2009 (has links)
We propose classification models for binary and multicategory data where the
predictor is a random function. The functional predictor could be irregularly and
sparsely sampled or characterized by high dimension and sharp localized changes. In
the former case, we employ Bayesian modeling utilizing flexible spline basis which is
widely used for functional regression. In the latter case, we use Bayesian modeling
with wavelet basis functions which have nice approximation properties over a large
class of functional spaces and can accommodate varieties of functional forms observed
in real life applications. We develop an unified hierarchical model which accommodates
both the adaptive spline or wavelet based function estimation model as well as
the logistic classification model. These two models are coupled together to borrow
strengths from each other in this unified hierarchical framework. The use of Gibbs
sampling with conjugate priors for posterior inference makes the method computationally
feasible. We compare the performance of the proposed models with the naive
models as well as existing alternatives by analyzing simulated as well as real data. We
also propose a Bayesian unified hierarchical model based on a proportional hazards model and generalized linear model for survival analysis with irregular longitudinal
covariates. This relatively simple joint model has two advantages. One is that using
spline basis simplifies the parameterizations while a flexible non-linear pattern of
the function is captured. The other is that joint modeling framework allows sharing
of the information between the regression of functional predictors and proportional
hazards modeling of survival data to improve the efficiency of estimation. The novel
method can be used not only for one functional predictor case, but also for multiple
functional predictors case. Our methods are applied to analyze real data sets and
compared with a parameterized regression method.
|
Page generated in 0.1116 seconds