791 |
Gully erosion assessment and prediction on non-agricultural lands using logistic regressionHandley, Katie January 1900 (has links)
Master of Science / Department of Biological & Agricultural Engineering / Stacy L. Hutchinson / Gully erosion is a serious problem on military training lands resulting in not only soil erosion and environmental degradation, but also increased soldier injuries and equipment damage. Assessment of gully erosion occurring on Fort Riley was conducted in order to evaluate different gully location methods and to develop a gully prediction model based on logistic regression. Of the 360 sites visited, fifty two gullies were identified with the majority found using LiDAR based data.
Logistic regression model was developed using topographic, landuse/landcover, and soil variables. Tests for multicollinearity were used to reduce the input variables such that each model input had a unique effect on the model output. The logistic regression determined that available water content was one of the most important factors affecting the formation of gullies. Additional important factors included particle size classification, runoff class, erosion class, and drainage class.
Of the 1577 watersheds evaluated for the Fort Riley area, 192 watersheds were predicted to have gullies. Model accuracy was approximately 79% with an error of omission or false positive value of 10% and an error of commission or false negative value of 11%; which is a large improvement compared to previous methods used to locate gully erosion.
|
792 |
Ordinary least squares regression of ordered categorical data: inferential implications for practiceLarrabee, Beth R. January 1900 (has links)
Master of Science / Department of Statistics / Nora Bello / Ordered categorical responses are frequently encountered in many disciplines. Examples of interest in agriculture include quality assessments, such as for soil or food products, and evaluation of lesion severity, such as teat ends status in dairy cattle. Ordered categorical responses are characterized by multiple categories or levels recorded on a ranked scale that, while apprising relative order, are not informative of magnitude of or proportionality between levels. A number of statistically sound models for ordered categorical responses have been proposed, such as logistic regression and probit models, but these are commonly underutilized in practice. Instead, the ordinary least squares linear regression model is often employed with ordered categorical responses despite violation of basic model assumptions. In this study, the inferential implications of this approach are investigated using a simulation study that evaluates robustness based on realized Type I error rate and statistical power. The design of the simulation study is motivated by applied research cases reported in the literature. A variety of plausible scenarios were considered for simulation, including various shapes of the frequency distribution and different number of categories of the ordered categorical response. Using a real dataset on frequency of antimicrobial use in feedlots, I demonstrate the inferential performance of ordinary least squares linear regression on ordered categorical responses relative to a probit model.
|
793 |
Active learning via Transduction in Regression ForestsHansson, Kim, Hörlin, Erik January 2015 (has links)
Context. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims to design, implement and evaluate the Ran-dom Forests algorithm combined with active learning that is suitable forpredictive tasks with real-value data outcomes where the amount of train-ing data is small. machine learning algorithms traditionally requires largeamounts of training data to create a general model, and training data is inmany cases sparse and expensive or difficult to create.Methods.The research methods used for this thesis is implementation andscientific experiment. An approach to active learning was implementedbased on previous work for classification type problems. The approachuses the Mahalanobis distance to perform active learning via transduction.Evaluation was done using several data sets were the decrease in predictionerror was measured over several iterations. The results of the evaluationwas then analyzed using nonparametric statistical testing.Results. The statistical analysis of the evaluation results failed to detect adifference between our approach and a non active learning approach, eventhough the proposed algorithm showed irregular performance. The evalu-ation of our tree-based traversal method, and the evaluation of the Maha-lanobis distance for transduction both showed that these methods performedbetter than Euclidean distance and complete graph traversal.Conclusions. We conclude that the proposed solution did not decreasethe amount of required training data on a significant level. However, theapproach has potential and future work could lead to a working active learn-ing solution. Further work is needed on key areas of the implementation,such as the choice of instances for active learning through transduction un-certainty as well as choice of method for going from transduction model toinduction model.
|
794 |
A multiple regression analysis of six factors concerning school district demographics and superintendent tenure and experience in 2007-2008 schools relative to student achievement on the third grade Kansas reading assessmentsMyers, Scott P. January 1900 (has links)
Doctor of Education / Department of Educational Leadership / Tweed R. Ross / The purpose of this quantitative study was to examine the relationship between the length of tenure of a superintendent and academic achievement as defined by the percentage of students who scored “Proficient” or better on the 2008 Third Grade Kansas Reading Assessment. To put this relationship into context, five other predictive variables were included as a part of this study: the individual’s total length of experience as a superintendent, the individual’s total length of experience in education, each district’s assessed valuation per pupil, each district’s percentage of students who qualified for free or reduced meal prices, and each district’s total student headcount. To gain the most comprehensive view possible, all 295 Kansas school districts in existence in 2008 were included in this study.
The backward method of multiple regression was utilized to analyze these data. Before performing this analysis, the researcher first checked to ensure that the assumption of no multicollinearity had been met. From this analysis, all six predictive variables were retained as no relationships between them were found to be too strong. Following this check, the backward method of multiple regression analysis was performed. This method of multiple regression seeks to create the most parsimonious model, so two of the predictive variables were excluded from the final summary model based on removal criterion, the significance value of the t-test of each predictive variable.
Results of this study revealed that 9.9% of the variance in the dependent variable, the percentage of students who scored “Proficient” or better on the 2008 Third Grade Kansas Reading Assessment, was accounted for by the predictive variables in the model retained. Further, multiple regression analysis tested the unique contributions of the four remaining predictive variables. Although included as one of the four predictive variables that had a significant effect on the percentage of students who scored “Proficient” or better on the 2008 Third Grade Kansas Reading Assessment, the primary focus of this study – to examine the impact a superintendent’s length of tenure has on students’ academic achievement – proved to have the least relative impact, according to beta weights.
|
795 |
Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set MethodsMair, Patrick, Hornik, Kurt, de Leeuw, Jan 21 October 2009 (has links) (PDF)
In this paper we give a general framework for isotone optimization. First we discuss a generalized version of the pool-adjacent-violators algorithm (PAVA) to minimize a separable convex function with simple chain constraints. Besides of general convex functions we extend existing PAVA implementations in terms of observation weights, approaches for tie handling, and responses from repeated measurement designs. Since isotone optimization problems can be formulated as convex programming problems with linear constraints we then develop a primal active set method to solve such problem. This methodology is applied on specific loss functions relevant in statistics. Both approaches are implemented in the R package isotone. (authors' abstract)
|
796 |
Logistic regression to determine significant factors associated with share price changeMuchabaiwa, Honest 19 February 2014 (has links)
This thesis investigates the factors that are associated with annual changes in the share price of Johannesburg Stock Exchange (JSE) listed companies. In this study, an increase in value of a share is when the share price of a company goes up by the end of the financial year as compared to the previous year. Secondary data that was sourced from McGregor BFA website was used. The data was from 2004 up to 2011.
Deciding which share to buy is the biggest challenge faced by both investment companies and individuals when investing on the stock exchange. This thesis uses binary logistic regression to identify the variables that are associated with share price increase.
The dependent variable was annual change in share price (ACSP) and the independent variables were assets per capital employed ratio, debt per assets ratio, debt per equity ratio, dividend yield, earnings per share, earnings yield, operating profit margin, price earnings ratio, return on assets, return on equity and return on capital employed.
Different variable selection methods were used and it was established that the backward elimination method produced the best model. It was established that the probability of success of a share is higher if the shareholders are anticipating a higher return on capital employed, and high earnings/ share. It was however, noted that the share price is negatively impacted by dividend yield and earnings yield. Since the odds of an increase in share price is higher if there is a higher return on capital employed and high earning per share, investors and investment companies are encouraged to choose companies with high earnings per share and the best returns on capital employed.
The final model had a classification rate of 68.3% and the validation sample produced a classification rate of 65.2% / Mathematical Sciences / M.Sc. (Statistics)
|
797 |
En statistisk analys av islastens effekt på en dammkonstruktion / A statistical analysis of the ice loads effect on a dam structureKlasson Svensson, Emil, Persson, Anton January 2016 (has links)
En damm används i huvudsak för att magasinera vatten i energiutvinningssyfte. Dammen rör sig fram och tillbaka i ett säsongsmönster mestadels beroende på skillnader i utomhustemperatur och vattentemperaturen i magasinet. Det nordiska klimatet innebär risk för isläggning i magasinet, för vilken lasten är relativt outforskad. Denna rapport syftar till ett med multipla linjära regressionsmodeller samt dynamiska regressionsmodeller avgöra vilka variabler som förklarar en specifik svensk dammkonstruktions rörelse. Dammens rörelse mäts genom att mäta dammens förflyttning kontra berggrunden med data från dammens inverterade pendlar. Av särskilt intresse är att avgöra islastens påverkan på rörelsen. Resultaten visar att multipla linjära regressions-modeller inte fullständigt lyckas modellera dammens rörelse, då de har problem med autokorrelerade residualer. Detta hanteras med hjälp av autoregressiva regressionsmodeller där de initiala förklarande variablerna inkluderas, kallat dynamisk regression. Denna rapports resultat visar att de autoregressiva parametrarna fungerar mycket väl för att förklara pendlarna, men att även tid, temperatur, det hydrostatiska trycket samt istjocklek är användbara förklarande variabler. Istjockleken visar signifikant påverkan på 5 % signifikansnivå på två av de undersökta pendlarna, vilket är ett noterbart resultat. Författarna menar att rapportens resultat indikerar att det finns anledning att fortsätta forska kring islastens påverkan på dammkonstruktioner. / A dam is a structure mainly used for storing water and generating electricity. The structure of a dam moves in a season-based pattern, mainly because of the difference in temperature between the air on outside of the dam and the water on the inside. Due to the Nordic climate, occurrences of icing on the water in the basin is fairly frequent. The effects of ice on the structural load of the dam are relatively unexplored and are the subject to this bachelor’s thesis. The goal of this project is to evaluate which predictors are significant to the movement of the dam with multiple linear regression models and dynamic regressions. The movement is measured by inverted pendulums that register the dam’s movement compared to the foundation. It is of particular interest to determine if the ice load influences the movement of the dam. The multiple regression models used to explain the dam’s movement were all discarded due to autocorrelation in the residuals. This falsifies the models, since autocorrelation means that they don’t meet the needed assumptions. To counteract the autocorrelation, dynamic models with autoregressive terms were fitted. These models showed no problem with autocorrelation. The result from the dynamic models were successful and managed to significantly explain the movement of the dam. The autoregressive terms proved to be efficient explanatory variables. The dynamic regression models also show that the time, temperature, hydrostatic pressure and ice thickness variables are also useful explanatory variables. The ice thickness shows a significant effect at the 5 % significance level on two of the investigated pendulums. The report's results indicate that there is reason to continue research on the ice load impact on dam constructions.
|
798 |
Börsintroduktioner i Sverige : En tvärsnittsundersökning om underprissättning och prissättningsmetoderPanic, Stefan, Taher, Roni January 2016 (has links)
Börsintroduktion är en process som innebär att ett företag för första gången tillgängliggör sina aktier för handel på börsen. Vid fastställandet av priset på en aktie finns två metoder som företag kan tillämpa, vilket är fast eller intervallprissättning. Problem som kan uppstå i samband med en börsintroduktion är att teckningskursen inte blir värderad till det pris som marknadsvärdet uppskattas till, vilket kallas underprissättning. Denna studie syftar till estimera magnituden av en eventuell underprissättning och jämföra olika faktorers påverkan på denna. Vidare är syftet att identifiera hur valet av prissättningsmetod påverkar den eventuella underprissättningen på den svenska börsmarknaden. Med hjälp av en regressionsanalys på 149 observationer, kom författarna av denna studie fram till att den genomsnittliga underprissättningen för perioden 2005-2015 har varit 4,61 %. Det konstaterades att en fast prissättningsmetod medför en högre förstadagsavkasting och att den rådande marknadsavkastningen har störst påverkan på underprissättningen. / IPO (Initial Public Offering) is a process in which a company for the first time starts to sell its shares on the stock market. When determining the share price, there are two methods acompany can use - bookbuilding or the fixed price method. Problems that may arise when introducing an IPO is that the offered price will not be equivalently valued with regard to the estimated market value, which is also called underpricing. The aim of this study is to estimatethe magnitude of a potential underpricing and to compare different impacts from various factors. Furthermore, the aim of the study is to identify how the choice of pricing method impacts the potential underpricing on the Swedish stock market. In a sample of 149 observations, the regression analysis implies that the market-adjusted underpricing is 4,61 % during the years 2005-2015. We find that a fixed price method generates higher average initial return. We also find that the current market rate of return has the greatest impact on underpricing.
|
799 |
Evaluation of Efficiency in the Activation and Accrual of Interventional Clinical Trials at Cancer CentersTate, Wendy Rose January 2016 (has links)
Background: Clinical trials represent a significant percentage of the time and cost to bring a drug through the development process and to Food and Drug Administration approval. Despite how critical these trials are to the drug development process, many studies are underpowered due to low accrual. This translates to valuable questions regarding the safety and effectiveness of new agents being left unanswered, requiring additional time and studies. A call for reform of the industry has been made by stakeholders in the clinical research enterprise; however, national change is slow. Thus sites that conduct clinical research must find methods to increase efficiency within the burdensome system currently in place. Throughout cancer centers adhering to the National Cancer Institute (NCI) Cancer Center Support Grant guidelines, efficiencies have been explored individually; however, there is a gap in knowledge on what factors affect sites system-wide. This dissertation seeks to examine factors that affect clinical trial efficiency in the areas of study activation looking at the outcome of local clinical trial accrual. Methods: Protocol and site-specific clinical trial administration data was collected regarding closed, interventional treatment and supportive care clinical trials from cancer centers adhering to NCI Cancer Center Support Grant guidelines during a five-year time period (2009-2014). Study characteristic analyses and hierarchical regression modeling was used to explore the effect of feasibility committee use and protocol workload on the outcomes of clinical trial accrual and time to activate a clinical trial. Sensitivity analyses were utilized when considering protocol workload to account for studies that had not yet closed to accrual, and thus were not included in this dataset. In addition, protocol- and site-specific variables were used to build regression models used to predict clinical trial accrual. Sensitivity, specificity, and accuracy were compared to the current standard, the institutional disease team. Results: Sixteen centers contributed a total of 5,787 protocols (range 93-697 studies). These studies accrued 49,319 subjects. Of all studies, 1,053 (18%) accrued zero subjects. Disease teams predicted 221% of actual accrual. Seven institutions submitted protocol workload information for 2,133 studies (36.9%) and 14,229 accruals (28.9%). Controlling for effect modifiers and interactions, and adjusting for institution, a statistically significant increase in clinical trial accrual and decrease in activation time was seen with the use of a feasibility committee. Regulatory protocol workload was significantly associated with clinical trial accrual and activation time; however, a single, definitive protocol workload was not identified that both minimized activation time and maximized clinical trial accrual. Protocol workload most often maximized accrual at workloads of between 3.5 and 5.0 protocols per staff member/FTE and minimized activation time at workloads between 1.0 and 1.9 protocols per staff member/FTE. Regression models predicted accrual more accurately than disease teams at all 16 centers, with site-specific models consistently having the best performance (versus an adjusted, hierarchical model). Conclusion: Despite institutional differences in variable association with accrual and activation times, the utilization of a feasibility committee was shown to improve clinical trial accrual as well as decrease activation time. Using systematic methods for examining study activation and accrual efficiencies resulted in the development of models that predicted clinical trial accrual better than the current standard (disease team prediction) at all participating centers. Further research is needed to better define and determine optimal workload. This information and these models may better inform study planning and resource allocation decisions by local stakeholders (administrators and investigators) in the clinical research enterprise.
|
800 |
Brain perfusion imaging : performance and accuracyZhu, Fan January 2013 (has links)
Brain perfusion weighted images acquired using dynamic contrast studies have an important clinical role in acute stroke diagnosis and treatment decisions. The purpose of my PhD research is to develop novel methodologies for improving the efficiency and quality of brain perfusion-imaging analysis so that clinical decisions can be made more accurately and in a shorter time. This thesis consists of three parts: My research investigates the possibility that parallel computing brings to make perfusion-imaging analysis faster in order to deliver results that are used in stroke diagnosis earlier. Brain perfusion analysis using local Arterial Input Functions (AIF) techniques takes a long time to execute due to its heavy computational load. As time is vitally important in the case of acute stroke, reducing analysis time and therefore diagnosis time can reduce the number of brain cells damaged and improve the chances for patient recovery. We present the implementation of a deconvolution algorithm for brain perfusion quantification on GPGPU (General Purpose computing on Graphics Processing Units) using the CUDA programming model. Our method aims to accelerate the process without any quality loss. Specific features of perfusion source images are also used to reduce noise impact, which consequently improves the accuracy of hemodynamic maps. The majority of existing approaches for denoising CT images are optimized for 3D (spatial) information, including spatial decimation (spatially weighted mean filters) and techniques based on wavelet and curvelet transforms. However, perfusion imaging data is 4D as it also contains temporal information. Our approach using Gaussian process regression (GPR) makes use of the temporal information in the perfusion source imges to reduce the noise level. Over the entire image, our noise reduction method based on Gaussian process regression gains a 99% contrast-to-noise ratio improvement over the raw image and also improves the quality of hemodynamic maps, allowing a better identification of edges and detailed information. At the level of individual voxels, GPR provides a stable baseline, helps identify key parameters from tissue time-concentration curves and reduces the oscillations in the curves. Furthermore, the results show that GPR is superior to the alternative techniques compared in this study. My research also explores automatic segmentation of perfusion images into potentially healthy areas and lesion areas, which can be used as additional information that assists in clinical diagnosis. Since perfusion source images contain more information than hemodynamic maps, good utilisation of source images leads to better understanding than the hemodynamic maps alone. Correlation coefficient tests are used to measure the similarities between the expected tissue time-concentration curves (from reference tissue) and the measured time-concentration curves (from target tissue). This information is then used to distinguish tissues at risk and dead tissues from healthy tissues. A correlation coefficient based signal analysis method that directly spots suspected lesion areas from perfusion source images is presented. Our method delivers a clear automatic segmentation of healthy tissue, tissue at risk and dead tissue. From our segmentation maps, it is easier to identify lesion boundaries than using traditional hemodynamic maps.
|
Page generated in 0.0796 seconds