371 |
Detection of erroneous payments utilizing supervised and utilizing supervised and unsupervised data mining techniquesYanik, Todd E. 09 1900 (has links)
Approved for public release; distribution in unlimited. / In this thesis we develop a procedure for detecting erroneous payments in the Defense Finance Accounting Service, Internal Review's (DFAS IR) Knowledge Base Of Erroneous Payments (KBOEP), with the use of supervised (Logistic Regression) and unsupervised (Classification and Regression Trees (C & RT)) modeling algorithms. S-Plus software was used to construct a supervised model of vendor payment data using Logistic Regression, along with the Hosmer-Lemeshow Test, for testing the predictive ability of the model. The Clementine Data Mining software was used to construct both supervised and unsupervised model of vendor payment data using Logistic Regression and C & RT algorithms. The Logistic Regression algorithm, in Clementine, generated a model with predictive probabilities, which were compared against the C & RT algorithm. In addition to comparing the predictive probabilities, Receiver Operating Characteristic (ROC) curves were generated for both models to determine which model provided the best results for a Coincidence Matrix's True Positive, True Negative, False Positive and False Negative Fractions. The best modeling technique was C & RT and was given to DFAS IR to assist in reducing the manual record selection process currently being used. A recommended ruleset was provided, along with a detailed explanation of the algorithm selection process. / Lieutenant Commander, United States Navy
|
372 |
Analyzing the effects of Urban combat on daily casualty ratesYazilitas, Hakan 06 1900 (has links)
Approved for public release; distribution is unlimited / This study explores whether the attacker's daily casualty rate (DCR) changes according to the terrain. The data set is a part of a larger database, Division Level Engagement Database from the Dupuy Institute. There are data on 253 battles, 96 of which occurred in urban areas. All the engagements are selected from European Theater of Operation (ETO) in World War II. The available data set contains measurements about the battles like initial strengths, daily casualties, terrain, front width, linear density, attacker's and defender's country, and armor losses. Hypothesis tests are used to find if the DCR is different in urban operations. A linear regression model is constructed to predict outcomes of similar engagements and to see the effect of each variable. It is concluded that the attacker's daily casualty rate is, on average, lower in urban operations. Terrain and force ratio are the most effective drivers of the daily casualty rate. In addition, it is seen that allied forces (U.S., U.K. and Canada) had a different approach to Military Operations on Urban Terrain than Soviet and German forces. The Allies used extensive combat power in urban operations. / First Lieutenant, Turkish Army
|
373 |
Using discrete event simulation to assess obstacle location accuracy in the REMUS unmanned underwater vehicleAllen, Timothy E. 06 1900 (has links)
is shown to follow an exponential distribution. These three models enable operators to explore the impact of various inputs prior to programming the vehicle, thus allowing them to choose the best combination of vehicle parameters that minimize the offset error between the reported and actual locations.
|
374 |
Comparison of retention characteristics over time: evidence from the 1992 and 1999 Department of Defense survey of active duty personnelGreenhoe, Richard J. 03 1900 (has links)
Approved for public release, distribution is unlimited / This thesis compares characteristics that influence intended stay/leave behavior of non prior service junior naval officers from two different time periods. Samples of officers, under the rank of lieutenant, with less then six years of active duty service from the 1992 and 1999 Department of Defense Surveys of Active Duty Personnel were used for this analysis. Metrics for important determinates of retention were constructed using similar questions from both surveys. Logistic regression was used to identify significant influences on retention intentions in both survey years. Two composite dimensions positively affected retention intentions in both survey years: satisfaction with Service Attributes and satisfaction with Present Employment Attributes. Being female negatively affected retention intentions in both surveys. The minority variable, Black, the number of PCS moves, and having debt greater than $7 ,500 positively affected retention intentions, while being stationed onboard a ship, probability of finding a civilian job, and the composite dimension, satisfaction with Future Employment Attributes, negatively affected retention intentions in 1992. Influence from a significant other and the number of hours worked positively affected retention in 1999. Monetary variables were highly significant for retention intentions in 1992 but not in 1999. It is likely that the force drawdown, base closures, and a weak economy in 1992 explain these differences. / Lieutenant, United States Navy
|
375 |
Spatiotemporal Analysis of Eastern Equine Encephalitis Human IncidenceAva, Jessika Lane, Ava, Jessika Lane January 2017 (has links)
Spatial and temporal components play a critical role in explaining variability across geographic regions and time, and are necessary components to space-time epidemiological research.
Until recent years, most spatial epidemiological studies have used simple space-time analyses, but the continuous advancements in statistical modeling software and geographic information systems have made more complex spatial analyses readily available. However, methods may be problematic and several ongoing statistical weaknesses have been documented, including failing to account for three significant correlative factors - spatial, temporal, and spatiotemporal autocorrelations.
Using Eastern Equine Encephalitis (EEE) human incidence data, this Master's thesis aimed to answer the research question, is there a northeastern shift in human EEE incidence within the United States, by identifying a statistical model that adjusts for spatial, temporal, and spatiotemporal autocorrelations.
This thesis introduced the spatial autoregressive distributed lag (SADL) model, a model that adjusts for spatial, temporal, and spatiotemporal autocorrelations. However, results demonstrated that EEE is too rare an event for the SADL model to be appropriate, and a non-autocorrelation model was used as the final model. Results showed that EEE incidence is significantly increasing over time for all infected regions of the United States, with a significant difference of 1.4 cases/10 million between 1964 and 2015. Results did not demonstrate a northeastern shift in EEE incidence as the northeastern US had the highest expected incidence across the entire study period (1964-1967: 2.9/10 million; 2012-2015: 6.8/10 million), but results did demonstrate that the northeastern US had the quickest increasing risk for EEE as compared to other infected regions of the US with an increase in expected incidence of 3.9/10 million between 1964 and 2015.
|
376 |
Discussion on Fifty Years of Classification and Regression TreesRusch, Thomas, Zeileis, Achim 12 1900 (has links) (PDF)
In this discussion paper, we argue that the literature on tree algorithms is very fragmented. We identify possible causes and discuss good and bad sides of this situation. Among the latter is the lack of free open-source
implementations for many algorithms. We argue that if the community adopts a standard of creating and sharing free open-source implementations for their developed algorithms and creates easy access to these programs the bad sides of the fragmentation will be actively combated and will benefit the whole scientific community. (authors' abstract)
|
377 |
Quantile regression with rank-based samplesAyilara, Olawale Fatai 01 November 2016 (has links)
Quantile Regression, as introduced by Koenker, R. and Bassett, G. (1978), provides
a complete picture of the relationship between the response variable and covariates
by estimating a family of conditional quantile functions. Also, it offers a natural
solution to challenges such as; homoscedasticity and sometimes unrealistic normality
assumption in the usual conditional mean regression. Most of the results for quantile
regression are based on simple random sampling (SRS). In this thesis, we study
the quantile regression with rank-based sampling methods. Rank-based sampling
methods have a wide range of applications in medical, ecological and environmental
research, and have been shown to perform better than SRS in estimating several
population parameters. We propose a new objective function which takes into
account the ranking information to estimate the unknown model parameters based
on the maxima or minima nomination sampling designs. We compare the mean
squared error of the proposed quantile regression estimates using maxima (or minima)
nomination sampling design and observe that it provides higher relative e ciency
when compared with its counterparts under SRS design for analyzing the upper
(or lower) tails of the distribution of the response variable. We also evaluate the
performance of our proposed methods when ranking is done with error. / February 2017
|
378 |
Outcome measurement error in survival analysisHirst, William Mark January 1998 (has links)
No description available.
|
379 |
Some limit behaviors for the LS estimators in errors-in-variables regression modelChen, Shu January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / There has been a continuing interest among statisticians in the problem of regression models wherein the independent variables are measured with error and there is considerable literature on the subject. In the following report, we discuss the errors-in-variables regression model: yi = β0 + β1xi + β2zi + ϵi,Xi = xi + ui,Zi = zi + vi with i.i.d. errors (ϵi, ui, vi), for
i = 1, 2, ..., n and find the least square estimators for the parameters of interest. Both weak and strong consistency for the least square estimators βˆ0, βˆ1, and βˆ2 of the unknown parameters β0, β1, and β2 are obtained. Moreover, under regularity conditions, the asymptotic normalities of the estimators are reported.
|
380 |
Semi-parametric estimation in Tobit regression modelsChen, Chunxia January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / In the classical Tobit regression model, the regression error term is often assumed to have a zero mean normal distribution with unknown variance, and the regression function is assumed to be linear. If the normality assumption is violated, then the commonly used maximum likelihood estimate becomes inconsistent. Moreover, the likelihood function will be very complicated if the regression function is nonlinear even the error density is normal, which makes the maximum likelihood estimation procedure hard to implement. In the full nonparametric setup when both the regression function and the distribution of the error term [epsilon] are unknown, some nonparametric estimators for the regression function has been proposed. Although the assumption of knowing the distribution is strict, it is a widely adopted assumption in Tobit regression literature, and is also confirmed by many empirical studies conducted in the econometric research. In fact, a majority of the relevant research assumes that [epsilon] possesses a normal distribution with mean 0 and unknown standard deviation. In this report, we will try to develop a semi-parametric estimation procedure for the regression function by assuming that the error term follows a distribution from a class of 0-mean symmetric location and scale family. A minimum distance estimation procedure for estimating the parameters in the regression function when it has a specified parametric form is also constructed. Compare with the existing semiparametric and nonparametric methods in the literature, our method would be more efficient in that more information, in particular the knowledge of the distribution of [epsilon], is used. Moreover, the computation is relative inexpensive. Given lots of application does assume that [epsilon] has normal or other known distribution, the current work no doubt provides some more practical tools for statistical inference in Tobit regression model.
|
Page generated in 0.073 seconds