411 |
Případové studie pro statistickou analýzu dat / Case studies for statistical data analysisChroboček, Michal January 2009 (has links)
This thesis deals with questions which are related to the creation of case studies for statistical data analysis using applied computer technology. The main aim is focused on showing the solution of statistical case studies in the field of electrical engineering. Solved case studies include task, exemplary solution and conclusion. Clarity of explained theory and the results understanding and interpretation is accentuated. This thesis can be used for practical education of applied statistical methods, it’s also supplemented with commented outputs from Minitab. Trial version of Minitab has been used for solution of case studies.
|
412 |
Nonparametric statistical inference for functional brain information mappingStelzer, Johannes 16 April 2014 (has links)
An ever-increasing number of functional magnetic resonance imaging (fMRI) studies are now using information-based multi-voxel pattern analysis (MVPA) techniques to decode mental states. In doing so, they achieve a significantly greater sensitivity compared to when they use univariate analysis frameworks. Two most prominent MVPA methods for information mapping are searchlight decoding and classifier weight mapping. The new MVPA brain mapping methods, however, have also posed new challenges for analysis and statistical inference on the group level. In this thesis, I discuss why the usual procedure of performing t-tests on MVPA derived information maps across subjects in order to produce a group statistic is inappropriate. I propose a fully nonparametric solution to this problem, which achieves higher sensitivity than the most commonly used t-based procedure. The proposed method is based on resampling methods and preserves the spatial dependencies in the MVPA-derived information maps. This enables to incorporate a cluster size control for the multiple testing problem. Using a volumetric searchlight decoding procedure and classifier weight maps, I demonstrate the validity and sensitivity of the new approach using both simulated and real fMRI data sets. In comparison to the standard t-test procedure implemented in SPM8, the new results showed a higher sensitivity and spatial specificity.
The second goal of this thesis is the comparison of the two widely used information mapping approaches -- the searchlight technique and classifier weight mapping. Both methods take into account the spatially distributed patterns of activation in order to predict stimulus conditions, however the searchlight method solely operates on the local scale. The searchlight decoding technique has furthermore been found to be prone to spatial inaccuracies. For instance, the spatial extent of informative areas is generally exaggerated, and their spatial configuration is distorted. In this thesis, I compare searchlight decoding with linear classifier weight mapping, both using the formerly proposed non-parametric statistical framework using a simulation and ultra-high-field 7T experimental data. It was found that the searchlight method led to spatial inaccuracies that are especially noticeable in high-resolution fMRI data. In contrast, the weight mapping method was more spatially precise, revealing both informative anatomical structures as well as the direction by which voxels contribute to the classification. By maximizing the spatial accuracy of ultra-high-field fMRI results, such global multivariate methods provide a substantial improvement for characterizing structure-function relationships.
|
413 |
Estimation non paramétrique adaptative dans la théorie des valeurs extrêmes : application en environnement / Nonparametric adaptive estimation in the extreme value theory : application in ecologyPham, Quang Khoai 09 January 2015 (has links)
L'objectif de cette thèse est de développer des méthodes statistiques basées sur la théorie des valeurs extrêmes pour estimer des probabilités d'évènements rares et des quantiles extrêmes conditionnelles. Nous considérons une suite de variables aléatoires indépendantes X_{t_1}$, $X_{t_2}$,...$,$X_{t_n}$ associées aux temps $0≤t_{1}< … <t_{n}≤T_{\max}$ où $X_{t_i}$ a la fonction de répartition $F_{t_i}$ et $F_t$ est la loi conditionnelle de $X$ sachant $T=t \in [0,T_{\max}]$. Pour chaque $t \in [0,T_{\max}]$, nous proposons un estimateur non paramétrique de quantiles extrêmes de $F_t$. L'idée de notre approche consiste à ajuster pour chaque $t \in [0,T_{\max}]$ la queue de la distribution $F_{t}$, par une distribution de Pareto de paramètre $\theta_{t,\tau}$ à partir d'un seuil $\tau.$ Le paramètre $\theta_{t,\tau}$ est estimé en utilisant un estimateur non paramétrique à noyau de taille de fenêtre $h$ basé sur les observations plus grandes que $\tau$. Sous certaines hypothèses de régularité, nous montrons que l'estimateur adaptatif proposé de $\theta_{t,\tau} $ est consistant et nous donnons sa vitesse de convergence. Nous proposons une procédure de tests séquentiels pour déterminer le seuil $\tau$ et nous obtenons le paramètre $h$ suivant deux méthodes : la validation croisée et une approche adaptative. Nous proposons également une méthode pour choisir simultanément le seuil $\tau$ et la taille de la fenêtre $h$. Finalement, les procédures proposées sont étudiées sur des données simulées et sur des données réelles dans le but d'aider à la surveillance de systèmes aquatiques. / The objective of this PhD thesis is to develop statistical methods based on the theory of extreme values to estimate the probabilities of rare events and conditional extreme quantiles. We consider independent random variables $X_{t_1},…,X_{t_n}$ associated to a sequence of times $0 ≤t_1 <… < t_n ≤ T_{\max}$ where $X_{t_i}$ has distribution function $F_{t_i}$ and $F_t$ is the conditional distribution of $X$ given $T = t \in [0,T_{\max}]$. For each $ t \in [0, T {\max}]$, we propose a nonparametric adaptive estimator for extreme quantiles of $F_t$. The idea of our approach is to adjust the tail of the distribution function $F_t$ with a Pareto distribution of parameter $\theta {t,\tau}$ starting from a threshold $\tau$. The parameter $\theta {t,\tau}$ is estimated using a nonparametric kernel estimator of bandwidth $h$ based on the observations larger than $\tau$. We propose a sequence testing based procedure for the choice of the threshold $\tau$ and we determine the bandwidth $h$ by two methods: cross validation and an adaptive procedure. Under some regularity assumptions, we prove that the adaptive estimator of $\theta {t, \tau}$ is consistent and we determine its rate of convergence. We also propose a method to choose simultaneously the threshold $\tau$ and the bandwidth $h$. Finally, we study the proposed procedures by simulation and on real data set to contribute to the survey of aquatic systems.
|
414 |
Selective Multivariate Applications In Forensic ScienceRinke, Caitlin 01 January 2012 (has links)
A 2009 report published by the National Research Council addressed the need for improvements in the field of forensic science. In the report emphasis was placed on the need for more rigorous scientific analysis within many forensic science disciplines and for established limitations and determination of error rates from statistical analysis. This research focused on multivariate statistical techniques for the analysis of spectral data obtained for multiple forensic applications which include samples from: automobile float glasses and paints, bones, metal transfers, ignitable liquids and fire debris, and organic compounds including explosives. The statistical techniques were used for two types of data analysis: classification and discrimination. Statistical methods including linear discriminant analysis and a novel soft classification method were used to provide classification of forensic samples based on a compiled library. The novel soft classification method combined three statistical steps: Principal Component Analysis (PCA), Target Factor Analysis (TFA), and Bayesian Decision Theory (BDT) to provide classification based on posterior probabilities of class membership. The posterior probabilities provide a statistical probability of classification which can aid a forensic analyst in reaching a conclusion. The second analytical approach applied nonparametric methods to provide the means for discrimination between samples. Nonparametric methods are performed as hypothesis test and do not assume normal distribution of the analytical figures of merit. The nonparametric iv permutation test was applied to forensic applications to determine the similarity between two samples and provide discrimination rates. Both the classification method and discrimination method were applied to data acquired from multiple instrumental methods. The instrumental methods included: Laser Induced-Breakdown Spectroscopy (LIBS), Fourier Transform Infrared Spectroscopy (FTIR), Raman spectroscopy, and Gas Chromatography-Mass Spectrometry (GC-MS). Some of these instrumental methods are currently applied to forensic applications, such as GC-MS for the analysis of ignitable liquid and fire debris samples; while others provide new instrumental methods to areas within forensic science which currently lack instrumental analysis techniques, such as LIBS for the analysis of metal transfers. The combination of the instrumental techniques and multivariate statistical techniques is investigated in new approaches to forensic applications in this research to assist in improving the field of forensic science.
|
415 |
Profile Monitoring with Fixed and Random Effects using Nonparametric and Semiparametric MethodsAbdel-Salam, Abdel-Salam Gomaa 20 November 2009 (has links)
Profile monitoring is a relatively new approach in quality control best used where the process data follow a profile (or curve) at each time period. The essential idea for profile monitoring is to model the profile via some parametric, nonparametric, and semiparametric methods and then monitor the fitted profiles or the estimated random effects over time to determine if there have been changes in the profiles. The majority of previous studies in profile monitoring focused on the parametric modeling of either linear or nonlinear profiles, with both fixed and random effects, under the assumption of correct model specification.
Our work considers those cases where the parametric model for the family of profiles is unknown or at least uncertain. Consequently, we consider monitoring profiles via two techniques, a nonparametric technique and a semiparametric procedure that combines both parametric and nonparametric profile fits, a procedure we refer to as model robust profile monitoring (MRPM). Also, we incorporate a mixed model approach to both the parametric and nonparametric model fits. For the mixed effects models, the MMRPM method is an extension of the MRPM method which incorporates a mixed model approach to both parametric and nonparametric model fits to account for the correlation within profiles and to deal with the collection of profiles as a random sample from a common population.
For each case, we formulated two Hotelling's T 2 statistics, one based on the estimated random effects and one based on the fitted values, and obtained the corresponding control limits. In addition,we used two different formulas for the estimated variancecovariance matrix: one based on the pooled sample variance-covariance matrix estimator and a second one based on the estimated variance-covariance matrix based on successive differences.
A Monte Carlo study was performed to compare the integrated mean square errors (IMSE) and the probability of signal of the parametric, nonparametric, and semiparametric approaches. Both correlated and uncorrelated errors structure scenarios were evaluated for varying amounts of model misspecification, number of profiles, number of observations per profile, shift location, and in- and out-of-control situations. The semiparametric (MMRPM) method for uncorrelated and correlated scenarios was competitive and, often, clearly superior with the parametric and nonparametric over all levels of misspecification. For a correctly specified model, the IMSE and the simulated probability of signal for the parametric and theMMRPM methods were identical (or nearly so).
For the severe modelmisspecification case, the nonparametric andMMRPM methods were identical (or nearly so). For the mild model misspecification case, the MMRPM method was superior to the parametric and nonparametric methods. Therefore, this simulation supports the claim that the MMRPM method is robust to model misspecification.
In addition, the MMRPM method performed better for data sets with correlated error structure. Also, the performances of the nonparametric and MMRPM methods improved as the number of observations per profile increases since more observations over the same range of X generally enables more knots to be used by the penalized spline method, resulting in greater flexibility and improved fits in the nonparametric curves and consequently, the semiparametric curves.
The parametric, nonparametric and semiparametric approaches were utilized for fitting the relationship between torque produced by an engine and engine speed in the automotive industry. Then, we used a Hotelling's T 2 statistic based on the estimated random effects to conduct Phase I studies to determine the outlying profiles. The parametric, nonparametric and seminonparametric methods showed that the process was stable. Despite the fact that all three methods reach the same conclusion regarding the –in-control– status of each profile, the nonparametric and MMRPM results provide a better description of the actual behavior of each profile. Thus, the nonparametric and MMRPM methods give the user greater ability to properly interpret the true relationship between engine speed and torque for this type of engine and an increased likelihood of detecting unusual engines in future production. Finally, we conclude that the nonparametric and semiparametric approaches performed better than the parametric approach when the user's model is misspecified. The case study demonstrates that, the proposed nonparametric and semiparametric methods are shown to be more efficient, flexible and robust to model misspecification for Phase I profile monitoring in a practical application.
Thus, our methods are robust to the common problem of model misspecification. We also found that both the nonparametric and the semiparametric methods result in charts with good abilities to detect changes in Phase I data, and in charts with easily calculated control limits. The proposed methods provide greater flexibility and efficiency than current parametric methods used in profile monitoring for Phase I that rely on correct model specification, an unrealistic situation in many practical problems in industrial applications. / Ph. D.
|
416 |
Semiparametric Varying Coefficient Models for Matched Case-Crossover StudiesOrtega Villa, Ana Maria 23 November 2015 (has links)
Semiparametric modeling is a combination of the parametric and nonparametric models in which some functions follow a known form and some others follow an unknown form. In this dissertation we made contributions to semiparametric modeling for matched case-crossover data.
In matched case-crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. Any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. However, some matching covariates such as time, and/or spatial location often play an important role as an effect modification. Failure to include them makes incorrect statistical estimation, prediction and inference. Hence in this dissertation, we propose several approaches that will allow the inclusion of time and spatial location as well as other effect modifications such as heterogeneous subpopulations among the data.
To address modification due to time, three methods are developed: the first is a parametric approach, the second is a semiparametric penalized approach and the third is a semiparametric Bayesian approach. We demonstrate the advantage of the one stage semiparametric approaches using both a simulation study and an epidemiological example of a 1-4 bi-directional case-crossover study of childhood aseptic meningitis with drinking water turbidity.
To address modifications due to time and spatial location, two methods are developed: the first one is a semiparametric spatial-temporal varying coefficient model for a small number of locations. The second method is a semiparametric spatial-temporal varying coefficient model, and is appropriate when the number of locations among the subjects is medium to large. We demonstrate the accuracy of these approaches by using simulation studies, and when appropriate, an epidemiological example of a 1-4 bi-directional case-crossover study.
Finally, to explore further effect modifications by heterogeneous subpopulations among strata we propose a nonparametric Bayesian approach constructed with Dirichlet process priors, which clusters subpopulations and assesses heterogeneity. We demonstrate the accuracy of our approach using a simulation study, as well a an example of a 1-4 bi-directional case-crossover study. / Ph. D.
|
417 |
Statistical evaluation of critical design storms : short duration stormsRizou, Maria 01 July 2000 (has links)
No description available.
|
418 |
Variance Change Point Detection under A Smoothly-changing Mean Trend with Application to Liver ProcurementGao, Zhenguo 23 February 2018 (has links)
Literature on change point analysis mostly requires a sudden change in the data distribution, either in a few parameters or the distribution as a whole. We are interested in the scenario that the variance of data may make a significant jump while the mean of data changes in a smooth fashion. It is motivated by a liver procurement experiment with organ surface temperature monitoring. Blindly applying the existing change point analysis methods to the example can yield erratic change point estimates since the smoothly-changing mean violates the sudden-change assumption. In my dissertation, we propose a penalized weighted least squares approach with an iterative estimation procedure that naturally integrates variance change point detection and smooth mean function estimation. Given the variance components, the mean function is estimated by smoothing splines as the minimizer of the penalized weighted least squares. Given the mean function, we propose a likelihood ratio test statistic for identifying the variance change point. The null distribution of the test statistic is derived together with the rates of convergence of all the parameter estimates. Simulations show excellent performance of the proposed method. Application analysis offers numerical support to the non-invasive organ viability assessment by surface temperature monitoring.
The method above can only yield the variance change point of temperature at a single point on the surface of the organ at a time. In practice, an organ is often transplanted as a whole or in part. Therefore, it is generally of more interest to study the variance change point for a chunk of organ. With this motivation, we extend our method to study variance change point for a chunk of the organ surface. Now the variances become functions on a 2D space of locations (longitude and latitude) and the mean is a function on a 3D space of location and time. We model the variance functions by thin-plate splines and the mean function by the tensor product of thin-plate splines and cubic splines. However, the additional dimensions in these functions incur serious computational problems since the sample size, as a product of the number of locations and the number of sampling time points, becomes too large to run the standard multi-dimensional spline models. To overcome the computational hurdle, we introduce a multi-stages subsampling strategy into our modified iterative algorithm. The strategy involves several down-sampling or subsampling steps educated by preliminary statistical measures. We carry out extensive simulations to show that the new method can efficiently cut down the computational cost and make a practically unsolvable problem solvable with reasonable time and satisfactory parameter estimates. Application of the new method to the liver surface temperature monitoring data shows its effectiveness in providing accurate status change information for a portion of or the whole organ. / Ph. D. / The viability evaluation is the key issue in the organ transplant operation. The donated organ must be viable at the time of being transplanted to the recipient. Nowadays, viability evaluation can be assessed by analyzing the temperature data monitored on the organ surface. In my dissertation, I have developed two new statistical methods to evaluate the viability status of a prepared organ by studying the organ surface temperature. The first method I have developed can be used to detect the change of viability status at a spot on the organ surface. The second method I have developed can be used to detect the change of viability condition for the selected organ chunks. In practice, combining these two methods together can provide accurate viability status change information for a portion of or the whole organ effectively.
|
419 |
比較使用Kernel和Spline法的傘型迴歸估計 / Compare the Estimation on Umbrella Function by Using Kernel and Spline Regression Method賴品霖, Lai, Pin Lin Unknown Date (has links)
本研究探討常用的兩個無母數迴歸方法,核迴歸與樣條迴歸,在具有傘型限制式下,對於傘型函數的估計與不具限制式下的傘型函數估計比較,同時也探討不同誤差變異對估計結果的影響,並進一步探討受限制下兩方法的估計比較。本研究採用「估計頂點位置與實際頂點位置差」及「誤差平方和」作為衡量估計結果的指標。在帶寬及節點的選取上,本研究採用逐一剔除交互驗證法來篩選。模擬結果顯示,受限制的核函數在誤差變異較大的頂點位置估計較佳,誤差變異縮小時反而頂點位置估計較差,受限制的B-樣條函數也有類似的狀況。而在兩方法的比較上,對於較小的誤差變異,核函數的頂點位置估計能力不如樣條函數,但在整體的誤差平方和上卻沒有太大劣勢,當誤差變異較大時,核函數的頂點位置估計能力有所提升,整體誤差平方和仍舊維持還不錯的結果。 / In this study, we give an umbrella order constraint on kernel and spline regression model. We compare their estimation in two measurements, one is the difference of estimate peak and true peak, the other one is the sum of square difference on predict and the true value. We use leave-one-out cross validation to select bandwidth for kernel function and also to decide the number of knots for spline function. The effect of different error size is also considered. Some of R packages are used when doing simulation. The result shows that when the error size is bigger, the prediction of peak location is better in both constrained kernel and spline estimation. The constrained spline regression tends to provide better peak location estimation compared to constrained kernel regression.
|
420 |
An analysis of the technical efficiency in Hong Kong's construction industryWang, You-song, 王幼松. January 1998 (has links)
published_or_final_version / Real Estate and Construction / Doctoral / Doctor of Philosophy
|
Page generated in 0.0823 seconds