• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 331
  • 135
  • 10
  • 4
  • Tagged with
  • 928
  • 928
  • 467
  • 437
  • 384
  • 380
  • 380
  • 184
  • 174
  • 92
  • 68
  • 66
  • 63
  • 62
  • 61
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
361

Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone Accelerometer

Na, Shuang 28 June 2017 (has links)
Time series analysis has been explored by the researchers in many areas such, as statistical research, engineering applications, medical analysis, and finance study. To represent the data more efficiently, the mining process is supported by time series segmentation. Time series segmentation algorithm looks for the change points between two different patterns and develops a suitable model, depending on the data observed in such segment. Based on the issue of limited computing and storage capability, it is necessary to consider an adaptive and incremental online segmentation method. In this study, we propose an Online Empirical Bayesian Kernel Segmentation (OBKS), which combines Online Multivariate Kernel Density Estimation (OMKDE) and Online Empirical Bayesian Segmentation (OBS) algorithm. This innovative method considers Online Multivariate Kernel density as a predictive distribution derived by Online Empirical Bayesian segmentation instead of using posterior predictive distribution as a predictive distribution. The benefit of Online Multivariate Kernel Density Estimation is that it does not require the assumption of a pre-defined prior function, which makes the OMKDE more adaptive and adjustable than the posterior predictive distribution. Human Activity Recognition (HAR) by smartphones with embedded sensors is a modern time series application applied in many areas, such as therapeutic applications and sensors of cars. The important procedures related to the HAR problem include classification, clustering, feature extraction, dimension reduction, and segmentation. Segmentation as the first step of HAR analysis attempts to represent the time interval more effectively and efficiently. The traditional segmentation method of HAR is to partition the time series into short and fixed length segments. However, these segments might not be long enough to capture the sufficient information for the entire activity time interval. In this research, we segment the observations of a whole activity as a whole interval using the Online Empirical Bayesian Kernel Segmentation algorithm as the first step. The smartphone with built-in accelerometer generates observations of these activities. Based on the segmenting result, we introduce a two-layer random forest classification method. The first layer is used to identify the main group; the second layer is designed to analyze the subgroup from each core group. We evaluate the performance of our method based on six activities: sitting, standing, lying, walking, walking\_upstairs, and walking\_downstairs on 30 volunteers. If we want to create a machine that can detect walking\_upstairs and walking\_downstairs automatically, it requires more information and more detail that can generate more complicated features, since these two activities are very similar. Continuously, considering the real-time Activity Recognition application on the smartphones by the embedded accelerometers, the first layer classifies the activities as static and dynamic activities, the second layer classifies each main group into the sub-classes, depending on the first layer result. For the data collected, we get an overall accuracy of 91.4\% based on the six activities and an overall accuracy of 100\% based only on the dynamic activity (walking, walking\_upstairs, walking\_downstairs) and the static activity (sitting, standing, lying).
362

Cybersecurity: Probabilistic Behavior of Vulnerability and Life Cycle

Rajasooriya, Sasith Maduranga 28 June 2017 (has links)
Analysis on Vulnerabilities and Vulnerability Life Cycle is at the core of Cybersecurity related studies. Vulnerability Life Cycle discussed by S. Frei and studies by several other scholars have noted the importance of this approach. Application of Statistical Methodologies in Cybersecurity related studies call for a greater deal of new information. Using currently available data from National Vulnerability Database this study develops and presents a set of useful Statistical tools to be applied in Cybersecurity related decision making processes. In the present study, the concept of Vulnerability Space is defined as a probability space. Relevant theoretical analyses are conducted and observations in the vulnerability space in aspects of events and states are discussed. Transforming IT related cybersecurity issues into analytical formation so that abstract and conceptual knowledge from Mathematics and Statistics can be applied is a challenge. However, to overcome rising threats from Cyber-attacks such an integration of analytical foundation to understand the issues and develop strategies is essential. In the present study we apply well known Markov approach in a new approach of Vulnerability Life Cycle to develop useful analytical methods to assess the Risk associated with a vulnerability. We also presents, a new Risk Index integrating the results obtained and details from the Common Vulnerability Scoring System (CVSS). In addition, a comprehensive study on the Vulnerability Space is presented discussing the likelihood of probable events in the probability sub-spaces of vulnerabilities. Finally, an Extended Vulnerability Life Cycle model is presented and discussed in relation to States and Events in the Vulnerability Space that lays down a strong foundation for any future vulnerability related analytical research efforts.
363

Bayesian Artificial Neural Networks in Health and Cybersecurity

Rodrigo, Hansapani Sarasepa 03 July 2017 (has links)
Being in the era of Big data, the applicability and importance of data-driven models like artificial neural network (ANN) in the modern statistics have increased substantially. In this dissertation, our main goal is to contribute to the development and the expansion of these ANN models by incorporating Bayesian learning techniques. We have demonstrated the applicability of these Bayesian ANN models in interdisciplinary research including health and cybersecurity. Breast cancer is one of the leading causes of deaths among females. Early and accurate diagnosis is a critical component which decides the survival of the patients. Including the well known ``Gail Model", numerous efforts are being made to quantify the risk of diagnosing malignant breast cancer. However, these models impose some limitations on their use of risk prediction. In this dissertation, we have developed a diagnosis model using ANN to identify the potential breast cancer patients with their demographic factors and the previous mammogram results. While developing the model, we applied the Bayesian regularization techniques (evidence procedure), along with the automatic relevance determination (ARD) prior, to minimize the network over-fitting. The optimal Bayesian network has 81\% overall accuracy in correctly classifying the actual status of breast cancer patients, 59\% sensitivity in accurately detecting the malignancy and 83\% specificity in correctly detecting non-malignancy. The area under the receiver operating characteristic curve (0.7940) shows that this is a moderate classification model. We then present a new Bayesian ANN model for developing a nonlinear Poisson regression model which can be used for count data modeling. Here, we have summarized all the important steps involved in developing the ANN model, including the forward-propagation, backward-propagation and the error gradient calculations of the newly developed network. As a part of this, we have introduced a new activation function into the output layer of the ANN and error minimizing criterion, using count data. Moreover, we have expanded our model to incorporate the Bayesian learning techniques. The performance our model is tested using simulation data. In addition to that, a piecewise constant hazard model is developed by extending the above nonlinear Poisson regression model under the Bayesian setting. This model can be utilized over the other conventional methods for accurate survival time prediction. With this, we were able to significantly improve the prediction accuracies. We captured the uncertainties of our predictions by incorporating the error bars which could not achieve with a linear Poisson model due to the overdispersion in the data. We also have proposed a new hybrid learning technique, and we evaluated the performance of those techniques with a varying number of hidden nodes and data size. Finally, we demonstrate the suitability of Bayesian ANN models for time series forecasting by using an online training algorithm. We have developed a vulnerability forecast model for the Linux operating system by using this approach.
364

Analysis of a Potential A(H7N9) Influenza Pandemic Outbreak in the U.S.

Silva Sotillo, Walter A. 22 June 2017 (has links)
This dissertation presents a collection of manuscripts that describe development of models and model implementation to analyze impact of potential A(H7N9) pandemic influenza outbreak in the U.S. Though this virus is still only animal-to-human transmittable, it has potential to become human-to-human transmittable and trigger a pandemic. This work is motivated by the negative impact on human lives that this virus has already caused in China, and is intended to support public health officials in preparing to protect U.S. population from a potential outbreak of pandemic scale. An agent-based (AB) simulation model is used to replicate the social dynamics of the contacts between the infected and the susceptible individuals. The model updates at the end of each day the status of all individuals by estimating the infection probabilities. This considers the contact process and the contagiousness of the infected individuals given by the disease natural history of the virus. The model is implemented on sample outbreak scenarios in selected regions in the U.S. The sampling results are used to estimate disease burden for the whole U.S. The results are also used to examine the impact of various virus strengths as well as the efficacy of different intervention strategies in mitigating a pandemic burden. This dissertation, also characterizes the infection time during a A(H7N9) influenza pandemic. Continuous distributions including exponential, Weibull, and lognormal are considered as possible candidates to model the infection time. Based on the negative likelihood, lognormal distribution provides the best fit. Such characterization is important, as many critical questions about the pandemic impact can be answered from using the distribution. Finally, the dissertation focuses on assessing community preparedness to deal with pandemic outbreaks using resilience as a measure. Resilience considers the ability to recover quickly from a pandemic outbreak and is defined as a function of the percentage of healthy population at any time. The analysis, estimations, and metrics presented in this dissertation are new contributions to the literature and they offer helpful perspectives for the public health decision makers in preparing for a potential threat of A(H7N9) pandemic.
365

Geospatial and Negative Binomial Regression Analysis of Culex nigripalpus, Culex erraticus, Coquillettidia perturbans, and Aedes vexans Counts and Precipitation and Land use Land cover Covariates in Polk County, Florida

Wright, Joshua P. 17 May 2017 (has links)
Although mosquito monitoring systems in the form of dry-ice bated CDC light traps and sentinel chickens are used by mosquito control personnel in Polk County, Florida, the placement of these are random and do not necessarily reflect prevalent areas of vector mosquito populations. This can result in significant health, economic, and social impacts during disease outbreaks. Of these vector mosquitoes Culex nigripalpus, Culex erraticus, Coquillettidia perturbans, and Aedes vexans are present in Polk County and known to transmit multiple diseases, posing a public health concern. This study seeks to evaluate the effect of Land use Land cover (LULC) unique features and precipitation on spatial and temporal distribution of Cx. nigripalpus, Cx. erraticus, Cq. perturbans, and Ae. vexans in Polk County, Florida, during 2013 and 2014, using negative binomial regression on count data from eight environmentally unique light traps retrieved from Polk County Mosquito Control. The negative binomial regression revealed a statistical association among mosquito species for precipitation and LULC features during the two-year study period, with precipitation proving to be the most significant factor in mosquito count numbers. The findings from this study can aid in more precise targeting of mosquito species, saving time and resources on already stressed public health services.
366

Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classification and Clustering

Wang, Xing 23 May 2016 (has links)
The Time Dependent Kernel Density Estimation (TDKDE) developed by Harvey & Oryshchenko (2012) is a kernel density estimation adjusted by the Exponentially Weighted Moving Average (EWMA) weighting scheme. The Maximum Likelihood Estimation (MLE) procedure for estimating the parameters proposed by Harvey & Oryshchenko (2012) is easy to apply but has two inherent problems. In this study, we evaluate the performances of the probability density estimation in terms of the uniformity of Probability Integral Transforms (PITs) on various kernel functions combined with different preset numbers. Furthermore, we develop a new estimation algorithm which can be conducted using Artificial Neural Networks to eliminate the inherent problems with the MLE method and to improve the estimation performance as well. Based on the new estimation algorithm, we develop the TDKDE-based Random Forests time series classification algorithm which is significantly superior to the commonly used statistical feature-based Random Forests method as well as the Ker- nel Density Estimation (KDE)-based Random Forests approach. Furthermore, the proposed TDKDE-based Self-organizing Map (SOM) clustering algorithm is demonstrated to be superior to the widely used Discrete-Wavelet- Transform (DWT)-based SOM method in terms of the Adjusted Rand Index (ARI).
367

The Effects of Age and Gender on Pedestrian Traffic Injuries: A Random Parameters and Latent Class Analysis

Raharjo, Tatok Raharjo 21 June 2016 (has links)
Pedestrians are vulnerable road users because they do not have any protection while they walk. They are unlike cyclists and motorcyclists who often have at least helmet protection and sometimes additional body protection (in the case of motorcyclists with body-armored jackets and pants). In the US, pedestrian fatalities are increasing and becoming an ever larger proportion of overall roadway fatalities (NHTSA, 2016), thus underscoring the need to study factors that influence pedestrian-injury severity and potentially develop appropriate countermeasures. One of the critical elements in the study of pedestrian-injury severities is to understand how injuries vary across age and gender ‒ two elements that have been shown to be critical injury determinants in past research. In the current research effort, 4829 police-reported pedestrian crashes from Chicago in 2011 and 2012 are used to estimate multinomial logit, mixed logit, and latent class logit models to study the effects of age and gender on resulting injury severities in pedestrian crashes. The results from these model estimations show that the injury severity level for older males, younger males, older females, and younger females are statistically different. Moreover, the overall findings also show that older males and older females are more likely to have higher injury-severity levels in many instances (if a crash occurs on city streets, state maintained urban roads, the primary cause of the crash is failing to yield right-of way, pedestrian entering/ leaving/ crossing is not at intersection, road surface condition is dry, and road functional class is a local road or street). The findings suggest that well-designed and well-placed crosswalks, small islands in two-way streets, narrow streets, clear road signs, provisions for resting places, and wide, flat sidewalks all have the potential to result in lower pedestrian-injury severities across age/gender combinations.
368

Parametric, Nonparametric and Semiparametric Approaches in Profile Monitoring of Poisson Data

Piri, Sepehr 01 January 2017 (has links)
Profile monitoring is a relatively new approach in quality control best used when the process data follow a profile (or curve). The majority of previous studies in profile monitoring focused on the parametric modeling of either linear or nonlinear profiles under the assumption of the correct model specification. Our work considers those cases where the parametric model for the family of profiles is unknown or, at least uncertain. Consequently, we consider monitoring Poisson profiles via three methods, a nonparametric (NP) method using penalized splines, a nonparametric (NP) method using wavelets and a semi parametric (SP) procedure that combines both parametric and NP profile fits. Our simulation results show that SP method is robust to the common problem of model misspecification of the user's proposed parametric model. We also showed that Haar wavelets are a better choice than the penalized splines in situations where a sudden jump happens or the jump is edgy. In addition, we showed that the penalized splines are better than wavelets when the shape of the profiles are smooth. The proposed novel techniques have been applied to a real data set and compare with some state-of-the arts.
369

Modeling in Finance and Insurance With Levy-It'o Driven Dynamic Processes under Semi Markov-type Switching Regimes and Time Domains

Assonken Tonfack, Patrick Armand 30 March 2017 (has links)
Mathematical and statistical modeling have been at the forefront of many significant advances in many disciplines in both the academic and industry sectors. From behavioral sciences to hard core quantum mechanics in physics, mathematical modeling has made a compelling argument for its usefulness and its necessity in advancing the current state of knowledge in the 21rst century. In Finance and Insurance in particular, stochastic modeling has proven to be an effective approach in accomplishing a vast array of tasks: risk management, leveraging of investments, prediction, hedging, pricing, insurance, and so on. However, the magnitude of the damage incurred in recent market crisis of 1929 (the great depression), 1937 (recession triggered by lingering fears emanating from the great depression), 1990 (one year recession following a decade of steady expansion) and 2007 (the great recession triggered by the sub-prime mortgage crisis) has suggested that there are certain aspects of financial markets not accounted for in existing modeling. Explanations have abounded as to why the market underwent such deep crisis and how to account for regime change risk. One such explanation brought forth was the existence of regimes in the financial markets. The basic idea of market regimes underscored the principle that the market was intrinsically subjected to many different states and can switch from one state to another under unknown and uncertain internal and external perturbations. Implementation of such a theory has been done in the simplifying case of Markov regimes. The mathematical simplicity of the Markovian regime model allows for semi-closed or closed form solutions in most financial applications while it also allows for economically interpretable parameters. However, there is a hefty price to be paid for such practical conveniences as many assumptions made on the market behavior are quite unreasonable and restrictive. One assumes for instance that each market regime has a constant propensity of switching to any other state irrespective of the age of the current state. One also assumes that there are no intermediate states as regime changes occur in a discrete manner from one of the finite states to another. There is therefore no telling how meaningful or reliable interpretation of parameters in Markov regime models are. In this thesis, we introduced a sound theoretical and analytic framework for Levy driven linear stochastic models under a semi Markov market regime switching process and derived It\'o formula for a general linear semi Markov switching model generated by a class of Levy It'o processes (1). It'o formula results in two important byproducts, namely semi closed form formulas for the characteristic function of log prices and a linear combination of duration times (2). Unlike Markov markets, the introduction of semi Markov markets allows a time varying propensity of regime change through the conditional intensity matrix. This is more in line with the notion that the market's chances of recovery (respectively, of crisis) are affected by the recession's age (respectively, recovery's age). Such a change is consistent with the notion that for instance, the longer the market is mired into a recession, the more improbable a fast recovery as the the market is more likely to either worsens or undergo a slow recovery. Another interesting consequence of the time dependence of the conditional intensity matrix is the interpretation of semi Markov regimes as a pseudo-infinite market regimes models. Although semi Markov regime assume a finite number of states, we note that while in any give regime, the market does not stay the same but goes through an infinite number of changes through its propensity of switching to other regimes. Each of those separate intermediate states endows the market with a structure of pseudo-infinite regimes which is an answer to the long standing problem of modeling market regime with infinitely many regimes. We developed a version of Girsanov theorem specific to semi Markov regime switching stochastic models, and this is a crucial contribution in relating the risk neutral parameters to the historical parameters (3). Given that Levy driven markets and regime switching markets are incomplete, there are more than one risk neutral measures that one can use for pricing derivative contracts. Although much work has been done about optimal choice of the pricing measure, two of them jump out of the current literature: the minimal martingale measure and the minimum entropy martingale measure. We first presented a general version of Girsanov theorem explicitly accounting for semi Markov regime. Then we presented Siu and Yang pricing kernel. In addition, we developed the conditional and unconditional minimum entropy martingale measure which minimized the dissimilarity between the historical and risk neutral probability measures through a version of Kulbach Leibler distance (4). Estimation of a European option price in a semi Markov market has been attempted before in the restricted case of the Black Scholes model. The problems encountered then were twofold: First, the author employed a Markov chain Monte Carlo methods which relied much on the tractability of the likelihood function of the normal random sequences. This tractability is unavailable for most Levy processes, hence the necessity of alternative pricing methods is essential. Second, the accuracy of the parameter estimates required tens of thousands of simulations as it is often the case with Metropolis Hasting algorithms with considerable CPU time demand. Both above outlined issues are resolved by the development of a semi-closed form expression of the characteristic function of log asset prices, and it opened the door to a Fourier transform method which is derived on the heels of Carr and Madan algorithm and the Fourier time stepping algorithm (5). A round of simulations and calibrations is performed to better capture the performance of the semi Markov model as opposed to Markov regime models. We establish through simulations that semi Markov parameters and the backward recurrence time have a substantial effect on option prices ( 6). Differences between Markov and Semi Markov market calibrations are quantified and the CPU times are reported. More importantly, interpretation of risk neutral semi Markov parameters offer more insight into the dynamic of market regimes than Markov market regime models ( 7). This has been systematically exhibited in this work as calibration results obtained from a set of European vanilla call options led to estimates of the shape and scale parameters of the Weibull distribution considered, offering a deeper view of the current market state as they determine the in-regime dynamic crucial to determining where the market is headed. After introducing semi Markov models through linear Levy driven models, we consider semi Markov markets with nonlinear multidimensional coupled asset price processes (8). We establish that the tractability of linear semi Markov market models carries over to multidimensional nonlinear asset price models. Estimating equations and pricing formula are derived for historical parameters and risk neutral parameters respectively (9). The particular case of basket of commodities is explored and we provide calibration formula of the model parameters to observed historical commodity prices through the LLGMM method. We also study the case of Heston model in a semi Markov switching market where only one parameter is subjected to semi Markov regime changes. Heston model is one the most popular model in option pricing as it reproduces many more stylized facts than Black Scholes model while retaining tractability. However, in addition to having a faster deceasing smiles than observed, one of the most damning shortcomings of most diffusion models such as Heston model, is their inability to accurately reproduce short term options prices. An avenue for solving these issues consists in generalizing Heston to account for semi Markov market regimes. Such a solution is implemented and a semi analytic formula for options is obtained.
370

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Shah, Smit 24 March 2015 (has links)
Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.

Page generated in 0.0787 seconds