Global ETD Search

71	Modlování vývoje výše škodních událostí / Modeling development of incurred value of claim Kantorová, Petra January 2010 (has links) This diploma project is focused on the estimation of incurred value of claim and probability of the claim remaining opened (not settled) in the specific stage of the insurance settlement process. The change of incurred value of claim means the change of settlement process stage. Generalized linear model is used for modelling these changes. Classical linear regression model also belongs into this theory, which is its special case, just with stricter premises. Generalized linear model among others allows solving the problem of heteroscedasticity in the unusual way using joint model. This model is applied in the practical part of this piece of work. Logistic regression is the part of the generalized linear model theory, which helps to model the probability of the claim remaining opened in this piece of work. The model outcome is presented in graphic way, especially the graphs containing probability that levels of given claim will occur in certain range.
72	Concave selection in generalized linear models Jiang, Dingfeng 01 May 2012 (has links) A family of concave penalties, including the smoothly clipped absolute deviation (SCAD) and minimax concave penalties (MCP), has been shown to have attractive properties in variable selection. The computation of concave penalized solutions, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm to compute the solutions of concave penalized generalized linear models (GLM). In contrast to the existing algorithms that uses local quadratic or local linear approximation of the penalty, the MMCD majorizes the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy avoids the computation of scaling factors in iterative steps, hence improves the efficiency of coordinate descent. Under certain regularity conditions, we establish the theoretical convergence property of the MMCD algorithm. We implement this algorithm in a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. Grouping structure among predictors exists in many regression applications. We first propose an l2 grouped concave penalty to incorporate such group information in a regression model. The l2 grouped concave penalty performs group selection and includes group Lasso as a special case. An efficient algorithm is developed and its theoretical convergence property is established under certain regularity conditions. The group selection property of the l2 grouped concave penalty is desirable in some applications; while in other applications selection at both group and individual levels is needed. Hence, we propose an l1 grouped concave penalty for variable selection at both individual and group levels. An efficient algorithm is also developed for the l1 grouped concave penalty. Simulation studies are performed to evaluate the finite-sample performance of the two grouped concave selection methods. The new grouped penalties are also used in analyzing two motivation datasets. The results from both the simulation and real data analyses demonstrate certain benefits of using grouped penalties. Therefore, the proposed concave group penalties are valuable alternatives to the standard concave penalties. concave penalty generalized linear model high dimentional data MCP SCAD variable selection Biostatistics
73	Hybrid non-linear model predictive control of a run-of-mine ore grinding mill circuit Botha, Stefan January 2018 (has links) A run-of-mine (ROM) ore milling circuit is primarily used to grind incoming ore containing precious metals to a powder fine enough to liberate the valuable minerals contained therein. The ground ore has a product particle size specification that is set by the downstream separation unit. A ROM ore milling circuit typically consists of a mill, sump and classifier (most commonly a hydrocyclone). These circuits are difficult to control because of unmeasurable process outputs, non-linearities, time delays, large unmeasured disturbances and complex models with modelling uncertainties. The ROM ore milling circuit should be controlled to meet the final product quality specification, but throughput should also be maximised. This further complicates ROM ore grinding mill circuit control, since an inverse non-linear relationship exists between the quality and throughput. ROM ore grinding mill circuit control is constantly evolving to find the best control method with peripheral tools to control the plant. Although many studies have been conducted, more are continually undertaken, since the controller designs are usually based on various assumptions and the required measurements in the grinding mill circuits are often unavailable. / To improve controller performance, many studies investigated the inclusion of additional manipulated variables (MVs) in the controller formulation to help control process disturbances, or to provide some form of functional control. Model predictive control (MPC) is considered one of the best advanced process control (APC) techniques and linear MPC controllers have been implemented on grinding mill circuits, while various other advanced controllers have been investigated and tested in simulation. Because of the complexity of grinding mill circuits non-linear MPC (NMPC) controllers have achieved better results in simulations where a wider operating region is required. In the search for additional MVs some researchers have considered including the discrete dynamics as part of the controller formulation instead of segregating them from the APC or base-layer controllers. The discrete dynamics are typically controlled using a layered approach. Discrete dynamics are on/off elements and in the case of a closed-loop grinding mill circuit the discrete elements can be on/off activation variables for feed conveyor belts to select which stockpile is used, selecting whether a secondary grinding stage should be active or not, and switching hydrocyclones in a hydrocyclone cluster. Discrete dynamics are added directly to the APC controllers by using hybrid model predictive control (HMPC). HMPC controllers have been designed for grinding mill circuits, but none of them has considered the switching of hydrocyclones as an additional MV and they only include linear dynamics for the continuous elements. This study addresses this gap by implementing a hybrid NMPC (HNMPC) controller that can switch the hydrocyclones in a cluster. / A commonly used continuous-time grinding mill circuit model with one hydrocyclone is adapted to contain a cluster of hydrocyclones, resulting in a hybrid model. The model parameters are refitted to ensure that the initial design steady-state conditions for the model are still valid with the cluster. The novel contribution of this research is the design of a HNMPC controller using a cluster of hydrocyclones as an additional MV. The HNMPC controller is formulated using the complete nonlinear hybrid model and a genetic algorithm (GA) as the solver. An NMPC controller is also designed and implemented as the base case controller in order to evaluate the HNMPC controller’s performance. To further illustrate the functional control benefits of including the hydrocyclone cluster as an MV, a linear optimisation objective was added to the HNMPC to increase the grinding circuit throughput, while maintaining the quality specification. The results show that the HNMPC controller outperforms the NMPC one in terms of setpoint tracking, disturbance rejection, and process optimisation objectives. The GA is shown to be a good solver for HNMPC, resulting in a robust controller that can still control the plant even when state noise is added to the simulation. / Dissertation (MEng)--University of Pretoria, 2018. / National Research Foundation (DAAD-NRF) / Electrical, Electronic and Computer Engineering / MEng / Unrestricted UCTD Advanced process control Genetic algorithm Grinding mill Hydrocyclone cluster
74	Multiple Learning for Generalized Linear Models in Big Data Xiang Liu (11819735) 19 December 2021 (has links) Big data is an enabling technology in digital transformation. It perfectly complements ordinary linear models and generalized linear models, as training well-performed ordinary linear models and generalized linear models require huge amounts of data. With the help of big data, ordinary and generalized linear models can be well-trained and thus offer better services to human beings. However, there are still many challenges to address for training ordinary linear models and generalized linear models in big data. One of the most prominent challenges is the computational challenges. Computational challenges refer to the memory inflation and training inefficiency issues occurred when processing data and training models. Hundreds of algorithms were proposed by the experts to alleviate/overcome the memory inflation issues. However, the solutions obtained are locally optimal solutions. Additionally, most of the proposed algorithms require loading the dataset to RAM many times when updating the model parameters. If multiple model hyper-parameters needed to be computed and compared, e.g. ridge regression, parallel computing techniques are applied in practice. Thus, multiple learning with sufficient statistics arrays are proposed to tackle the memory inflation and training inefficiency issues. Distributed Computing big data Linear regression analyses Distributed computing Sufficient statistics Generalized Linear Model
75	A Bayesian approach to energy monitoring optimization Carstens, Herman January 2017 (has links) This thesis develops methods for reducing energy Measurement and Verification (M&V) costs through the use of Bayesian statistics. M&V quantifies the savings of energy efficiency and demand side projects by comparing the energy use in a given period to what that use would have been, had no interventions taken place. The case of a large-scale lighting retrofit study, where incandescent lamps are replaced by Compact Fluorescent Lamps (CFLs), is considered. These projects often need to be monitored over a number of years with a predetermined level of statistical rigour, making M&V very expensive. M&V lighting retrofit projects have two interrelated uncertainty components that need to be addressed, and which form the basis of this thesis. The first is the uncertainty in the annual energy use of the average lamp, and the second the persistence of the savings over multiple years, determined by the number of lamps that are still functioning in a given year. For longitudinal projects, the results from these two aspects need to be obtained for multiple years. This thesis addresses these problems by using the Bayesian statistical paradigm. Bayesian statistics is still relatively unknown in M&V, and presents an opportunity for increasing the efficiency of statistical analyses, especially for such projects. After a thorough literature review, especially of measurement uncertainty in M&V, and an introduction to Bayesian statistics for M&V, three methods are developed. These methods address the three types of uncertainty in M&V: measurement, sampling, and modelling. The first method is a low-cost energy meter calibration technique. The second method is a Dynamic Linear Model (DLM) with Bayesian Forecasting for determining the size of the metering sample that needs to be taken in a given year. The third method is a Dynamic Generalised Linear Model (DGLM) for determining the size of the population survival survey sample. It is often required by law that M&V energy meters be calibrated periodically by accredited laboratories. This can be expensive and inconvenient, especially if the facility needs to be shut down for meter installation or removal. Some jurisdictions also require meters to be calibrated in-situ; in their operating environments. However, it is shown that metering uncertainty makes a relatively small impact to overall M&V uncertainty in the presence of sampling, and therefore the costs of such laboratory calibration may outweigh the benefits. The proposed technique uses another commercial-grade meter (which also measures with error) to achieve this calibration in-situ. This is done by accounting for the mismeasurement effect through a mathematical technique called Simulation Extrapolation (SIMEX). The SIMEX result is refined using Bayesian statistics, and achieves acceptably low error rates and accurate parameter estimates. The second technique uses a DLM with Bayesian forecasting to quantify the uncertainty in metering only a sample of the total population of lighting circuits. A Genetic Algorithm (GA) is then applied to determine an efficient sampling plan. Bayesian statistics is especially useful in this case because it allows the results from previous years to inform the planning of future samples. It also allows for exact uncertainty quantification, where current confidence interval techniques do not always do so. Results show a cost reduction of up to 66%, but this depends on the costing scheme used. The study then explores the robustness of the efficient sampling plans to forecast error, and finds a 50% chance of undersampling for such plans, due to the standard M&V sampling formula which lacks statistical power. The third technique uses a DGLM in the same way as the DLM, except for population survival survey samples and persistence studies, not metering samples. Convolving the binomial survey result distributions inside a GA is problematic, and instead of Monte Carlo simulation, a relatively new technique called Mellin Transform Moment Calculation is applied to the problem. The technique is then expanded to model stratified sampling designs for heterogeneous populations. Results show a cost reduction of 17-40%, although this depends on the costing scheme used. Finally the DLM and DGLM are combined into an efficient overall M&V plan where metering and survey costs are traded off over multiple years, while still adhering to statistical precision constraints. This is done for simple random sampling and stratified designs. Monitoring costs are reduced by 26-40% for the costing scheme assumed. The results demonstrate the power and flexibility of Bayesian statistics for M&V applications, both in terms of exact uncertainty quantification, and by increasing the efficiency of the study and reducing monitoring costs. / Hierdie proefskrif ontwikkel metodes waarmee die koste van energiemonitering en verifieëring (M&V) deur Bayesiese statistiek verlaag kan word. M&V bepaal die hoeveelheid besparings wat deur energiedoeltreffendheid- en vraagkantbestuurprojekte behaal kan word. Dit word gedoen deur die energieverbruik in ’n gegewe tydperk te vergelyk met wat dit sou wees indien geen ingryping plaasgevind het nie. ’n Grootskaalse beligtingsretrofitstudie, waar filamentgloeilampe met fluoresserende spaarlampe vervang word, dien as ’n gevallestudie. Sulke projekte moet gewoonlik oor baie jare met ’n vasgestelde statistiese akkuuraatheid gemonitor word, wat M&V duur kan maak. Twee verwante onsekerheidskomponente moet in M&V beligtingsprojekte aangespreek word, en vorm die grondslag van hierdie proefskrif. Ten eerste is daar die onsekerheid in jaarlikse energieverbruik van die gemiddelde lamp. Ten tweede is daar die volhoubaarheid van die besparings oor veelvoudige jare, wat bepaal word deur die aantal lampe wat tot in ’n gegewe jaar behoue bly. Vir longitudinale projekte moet hierdie twee komponente oor veelvoudige jare bepaal word. Hierdie proefskrif spreek die probleem deur middel van ’n Bayesiese paradigma aan. Bayesiese statistiek is nog relatief onbekend in M&V, en bied ’n geleentheid om die doeltreffendheid van statistiese analises te verhoog, veral vir bogenoemde projekte. Die proefskrif begin met ’n deeglike literatuurstudie, veral met betrekking tot metingsonsekerheid in M&V. Daarna word ’n inleiding tot Bayesiese statistiek vir M&V voorgehou, en drie metodes word ontwikkel. Hierdie metodes spreek die drie hoofbronne van onsekerheid in M&V aan: metings, opnames, en modellering. Die eerste metode is ’n laekoste energiemeterkalibrasietegniek. Die tweede metode is ’n Dinamiese Linieêre Model (DLM) met Bayesiese vooruitskatting, waarmee meter opnamegroottes bepaal kan word. Die derde metode is ’n Dinamiese Veralgemeende Linieêre Model (DVLM), waarmee bevolkingsoorlewing opnamegroottes bepaal kan word. Volgens wet moet M&V energiemeters gereeld deur erkende laboratoria gekalibreer word. Dit kan duur en ongerieflik wees, veral as die aanleg tydens meterverwydering en -installering afgeskakel moet word. Sommige regsgebiede vereis ook dat meters in-situ gekalibreer word; in hul bedryfsomgewings. Tog word dit aangetoon dat metingsonsekerheid ’n klein deel van die algehele M&V onsekerheid beslaan, veral wanneer opnames gedoen word. Dit bevraagteken die kostevoordeel van laboratoriumkalibrering. Die voorgestelde tegniek gebruik ’n ander kommersieële-akkuurraatheidsgraad meter (wat self ’n nie-weglaatbare metingsfout bevat), om die kalibrasie in-situ te behaal. Dit word gedoen deur die metingsfout deur SIMulerings EKStraptolering (SIMEKS) te verminder. Die SIMEKS resultaat word dan deur Bayesiese statistiek verbeter, en behaal aanvaarbare foutbereike en akkuurate parameterafskattings. Die tweede tegniek gebruik ’n DLM met Bayesiese vooruitskatting om die onsekerheid in die meting van die opnamemonster van die algehele bevolking af te skat. ’n Genetiese Algoritme (GA) word dan toegepas om doeltreffende opnamegroottes te vind. Bayesiese statistiek is veral nuttig in hierdie geval aangesien dit vorige jare se uitslae kan gebruik om huidige afskattings te belig Dit laat ook die presiese afskatting van onsekerheid toe, terwyl standaard vertrouensintervaltegnieke dit nie doen nie. Resultate toon ’n kostebesparing van tot 66%. Die studie ondersoek dan die standvastigheid van kostedoeltreffende opnameplanne in die teenwoordigheid van vooruitskattingsfoute. Dit word gevind dat kostedoeltreffende opnamegroottes 50% van die tyd te klein is, vanweë die gebrek aan statistiese krag in die standaard M&V formules. Die derde tegniek gebruik ’n DVLM op dieselfde manier as die DLM, behalwe dat bevolkingsoorlewingopnamegroottes ondersoek word. Die saamrol van binomiale opname-uitslae binne die GA skep ’n probleem, en in plaas van ’n Monte Carlo simulasie word die relatiewe nuwe Mellin Vervorming Moment Berekening op die probleem toegepas. Die tegniek word dan uitgebou om laagsgewyse opname-ontwerpe vir heterogene bevolkings te vind. Die uitslae wys ’n 17-40% kosteverlaging, alhoewel dit van die koste-skema afhang. Laastens word die DLM en DVLM saamgevoeg om ’n doeltreffende algehele M&V plan, waar meting en opnamekostes teen mekaar afgespeel word, te ontwerp. Dit word vir eenvoudige en laagsgewyse opname-ontwerpe gedoen. Moniteringskostes word met 26-40% verlaag, maar hang van die aangenome koste-skema af. Die uitslae bewys die krag en buigsaamheid van Bayesiese statistiek vir M&V toepassings, beide vir presiese onsekerheidskwantifisering, en deur die doeltreffendheid van die dataverbruik te verhoog en sodoende moniteringskostes te verlaag. / Thesis (PhD)--University of Pretoria, 2017. / National Research Foundation / Department of Science and Technology / National Hub for the Postgraduate Programme in Energy Efficiency and Demand Side Management / Electrical, Electronic and Computer Engineering / PhD / Unrestricted Measurement and Verification Errors-in-Variables Longitudinal Sampling Generalised Linear Model Simulation Extrapolation Machine Learning UCTD
76	Autonomous Vertical Autorotation for Unmanned Helicopters Dalamagkidis, Konstantinos 30 July 2009 (has links) Small Unmanned Aircraft Systems (UAS) are considered the stepping stone for the integration of civil unmanned vehicles in the National Airspace System (NAS) because of their low cost and risk. Such systems are aimed at a variety of applications including search and rescue, surveillance, communications, traffic monitoring and inspection of buildings, power lines and bridges. Amidst these systems, small helicopters play an important role because of their capability to hold a position, to maneuver in tight spaces and to take off and land from virtually anywhere. Nevertheless civil adoption of such systems is minimal, mostly because of regulatory problems that in turn are due to safety concerns. This dissertation examines the risk to safety imposed by UAS in general and small helicopters in particular, focusing on accidents resulting in a ground impact. To improve the performance of small helicopters in this area, the use of autonomous autorotation is proposed. This research goes beyond previous work in the area of autonomous autorotation by developing an on-line, model-based, real-time controller that is capable of handling constraints and different cost functions. The approach selected is based on a non-linear model-predictive controller, that is augmented by a neural network to improve the speed of the non-linear optimization. The immediate benefit of this controller is that a class of failures that would otherwise result in an uncontrolled crash and possible injuries or fatalities can now be accommodated. Furthermore besides simply landing the helicopter, the controller is also capable of minimizing the risk of serious injury to people in the area. This is accomplished by minimizing the kinetic energy during the last phase of the descent. The presented research is designed to benefit the entire UAS community as well as the public, by allowing for safer UAS operations, which in turn also allow faster and less expensive integration of UAS in the NAS. Helicopter Control Non-linear Model-predictive Control Neural Network Autorotative Flight Safety American Studies Arts and Humanities
77	Extensions to Bayesian generalized linear mixed effects models for household tuberculosis transmission McIntosh, Avery Isaac 12 May 2017 (has links) Understanding tuberculosis transmission is vital for efforts at interrupting the spread of disease. Household contact studies that follow persons sharing a household with a TB case—so-called household contacts—and test for latent TB infection by tuberculin skin test conversion give investigators vital information about risk factors for TB transmission. In these studies, investigators often assume secondary cases are infected by the primary TB case, despite substantial evidence that infection from a source outside the home is often equally likely, especially in high-prevalence settings. Investigators may discard information on contacts who test positive at study initiation due to uncertainty of the infection source, or assume infected contacts were infected from the index case prior to study initiation. With either assumption, information on transmission dynamics is lost or incomplete, and estimates of household risk factors for transmission will be biased. This dissertation describes an approach to modeling TB transmission that accounts for community-acquired transmission in the estimation of transmission risk factors from household contact study data. The proposed model generates population-specific estimates of the probability a contact of an infectious case will be infected from a source outside the home—a vital statistic for planning effective interventions to halt disease spread—in additional to estimates of household transmission predictors. We first describe the model analytically, and then apply it to synthetic datasets under different risk scenarios. We then fit the model to data taken from three household contact studies in different locations: Brazil, India, and Uganda. Infection predictors such as contact sleeping proximity to the index case and index case disease severity are underestimated in standard models compared to the proposed method, and non-household TB infection risk increases with age stratum, reflecting longer at-risk duration for community-based exposure for older contacts. This analysis will aid public health planners in understanding how best to interrupt TB spread in disparate populations by characterizing where transmission risk is greatest and which risk factors influence household-acquired transmission. Finally, we present an open-source software package in the R environment titled upmfit for modular implementation of the Bayesian Markov Chain Monte Carlo methods used to estimate the model. / 2018-05-10T00:00:00Z Biostatistics Bayesian mixed effects Generalized linear model Hierarchical models Infection Tuberculosis
78	Testing and Estimation for Functional Data with Applications to Magnetometer Records Maslova, Inga 01 May 2009 (has links) The functional linear model, $Y_n = Psi X_n + varepsilon_n$, with functional response and explanatory variables is considered. A simple test of the nullity of $Psi$ based on the principal component decomposition is proposed. The test statistic has asymptotic chi-squared distribution, which is also an excellent approximation in finite samples. The methodology is applied to data from terrestrial magnetic observatories. In recent years, the interaction of the auroral substorms with the equatorial and mid-latitude currents has been the subject of extensive research. We introduce a new statistical technique that allows us to test at a specified significance level whether such a dependence exists, and how long it persists. This quantitative statistical technique, relying on the concepts and tools of functional data analysis, uses directly magnetometer records in one minute resolution, and it can be applied to similar geophysical data which can be represented as daily curves. It is conceptually similar to testing the nullity of the slope in the straight line regression, but both the regressors and the responses are curves rather than points. When the regressors are daily high latitude $H$--component curves during substorm days and the responses are daily mid-- or low latitude $H$--component curves, our test shows significant dependence (the nullity hypothesis is rejected), which exists not only on the same UT day, but also extends into the next day for strong substorms. We propose a novel approach based on wavelet and functional principal component analysis to produce a cleaner index of the intensity of the symmetric ring current. We use functional canonical correlations to show that the new approach more effectively extracts symmetric global features. The main result of our work is the construction of a new index, which is an improved version of the existing wavelet-based index (WISA) and the old Dst index, in which a constant daily variation is removed. Here, we address the fact that the daily component varies from day to day and construct a ``cleaner'' index by removing non-constant daily variations. A wavelet-based method of deconvoluting the solar quiet variation from the low and mid-latitude H-component records is proposed. The resulting daily variation is non--constant, and its day--to--day variability is quantified by functional principal component scores. The procedure removes the signature of an enhanced ring current by comparing the scores at different stations. The method is fully algorithmic and is implemented in the statistical software R. R package for space physics applications is developed. It consists of several functions that compute indices of the storm activity and estimate the daily variation. Storm indices are computed automatically without any human intervention using the most recent magnetometer data available. Functional principal component analysis techniques are used to extract day-to-day variations. This package will be publicly available at Comprehensive R Archive Network (CRAN). Functional data functional linear model independence test magnetic storms substorms Mathematics
79	Generalized Minimum Penalized Hellinger Distance Estimation and Generalized Penalized Hellinger Deviance Testing for Generalized Linear Models: The Discrete Case Yan, Huey 01 May 2001 (has links) In this dissertation, robust and efficient alternatives to quasi-likelihood estimation and likelihood ratio tests are developed for discrete generalized linear models. The estimation method considered is a penalized minimum Hellinger distance procedure that generalizes a procedure developed by Harris and Basu for estimating parameters of a single discrete probability distribution from a random sample. A bootstrap algorithm is proposed to select the weight of the penalty term. Simulations are carried out to compare the new estimators with quasi-likelihood estimation. The robustness of the estimation procedure is demonstrated by simulation work and by Hapel's α-influence curve. Penalized minimum Hellinger deviance tests for goodness-of-fit and for testing nested linear hypotheses are proposed and simulated. A nonparametric bootstrap algorithm is proposed to obtain critical values for the testing procedure. penalized Hellinger distance estimation linear model discrete case Mathematics Statistics and Probability
80	A Longitudinal Analysis to Compare a Tailored Web-Based Intervention and Tailored Phone Counseling to Usual Care for Improving Beliefs of Colorectal Cancer Screening Dorman, Hannah Louise 07 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / An analysis of longitudinal data collected about beliefs regarding colorectal cancer (CRC) screenings at three-time points was analyzed to determine whether the beliefs improved from either the Web-Based, Phone-Based, or Web + Phone interventions compared to Usual Care. A mixed linear model adjusting for baseline and controlling for covariates was used to determine the effects of the intervention; Web-Based intervention was the most efficacious in improving beliefs, and phone intervention was also efficacious for several beliefs, compared to usual care. Cancer Colorectal Cancer Breast Cancer Longitudinal Data Analysis Mixed Linear Model

Search results