921 |
Statistical Methods for Analysis of the Homeowner's Impact on Property Valuation and Its Relation to the Mortgage Portfolio / Statistiska metoder för analys av husägarens påverkan på husvärdet och dess koppling till HamellHamell, Clara January 2020 (has links)
The current method for house valuations in mortgage portfolio models corresponds to applying a residential property price index (RPPI) to the purchasing price (or last known valuation). This thesis introduces an alternative house valuation method, which combines the current one with the bank's customer data. This approach shows that the gap between the actual house value and the current estimated house value can to some extent be explained by customer attributes, especially for houses where the homeowner is a defaulted customer. The inclusion of customer attributes can either reduce false overestimation or predict whether or not the current valuation is an overestimation or underestimation. This particular property is of interest in credit risk, as false overestimations can have negative impacts on the mortgage portfolio. The statistical methods that were used in this thesis were the data mining techniques regression and clustering. / De modeller och tillvägagångssätt som i dagsläget används för husvärdering i bolåneportföljen bygger på husprisindexering och köpesskilling. Denna studie introducerar ett alternativt sätt att uppskattta husvärdet, genom att kombinera dagens metod med bankens egna kunddata. Det här tillvägagångssättet visar på att gapet mellan det faktiska och det uppskattade husvärdet kan i viss mån förklaras av kunddata, framförallt där husägaren är en fallerad kund. Inkluderandet av kunddata kan både minska dagens övervärdering samt predicera huruvida dagens uppskattning är en övervärdering eller undervärdering. För fallerade kunder gav den alternativa husvärderingen ett mer sanningsenligt uppskattat värde av försäljningspriset än den traditionella metoden. Denna egenskap är av intresse inom kreditrisk, då en falsk övervärdering kan ha negativa konsekvenser på bolåneportföljen, framförallt för fallerade kunder. De statistiska verktyg som användes i denna studie var diverse regressionsmetoder samt klusteranalys.
|
922 |
Myson Burch ThesisMyson C Burch (16637289) 08 August 2023 (has links)
<p>With the completion of the Human Genome Project and many additional efforts since, there is an abundance of genetic data that can be leveraged to revolutionize healthcare. Now, there are significant efforts to develop state-of-the-art techniques that reveal insights about connections between genetics and complex diseases such as diabetes, heart disease, or common psychiatric conditions that depend on multiple genes interacting with environmental factors. These methods help pave the way towards diagnosis, cure, and ultimately prediction and prevention of complex disorders. As a part of this effort, we address high dimensional genomics-related questions through mathematical modeling, statistical methodologies, combinatorics and scalable algorithms. More specifically, we develop innovative techniques at the intersection of technology and life sciences using biobank scale data from genome-wide association studies (GWAS) and machine learning as an effort to better understand human health and disease. <br>
<br>
The underlying principle behind Genome Wide Association Studies (GWAS) is a test for association between genotyped variants for each individual and the trait of interest. GWAS have been extensively used to estimate the signed effects of trait-associated alleles, mapping genes to disorders and over the past decade about 10,000 strong associations between genetic variants and one (or more) complex traits have been reported. One of the key challenges in GWAS is population stratification which can lead to spurious genotype-trait associations. Our work proposes a simple clustering-based approach to correct for stratification better than existing methods. This method takes into account the linkage disequilibrium (LD) while computing the distance between the individuals in a sample. Our approach, called CluStrat, performs Agglomerative Hierarchical Clustering (AHC) using a regularized Mahalanobis distance-based GRM, which captures the population-level covariance (LD) matrix for the available genotype data.<br>
<br>
Linear mixed models (LMMs) have been a popular and powerful method when conducting genome-wide association studies (GWAS) in the presence of population structure. LMMs are computationally expensive relative to simpler techniques. We implement matrix sketching in LMMs (MaSk-LMM) to mitigate the more expensive computations. Matrix sketching is an approximation technique where random projections are applied to compress the original dataset into one that is significantly smaller and still preserves some of the properties of the original dataset up to some guaranteed approximation ratio. This technique naturally applies to problems in genetics where we can treat large biobanks as a matrix with the rows representing samples and columns representing SNPs. These matrices will be very large due to the large number of individuals and markers in biobanks and can benefit from matrix sketching. Our approach tackles the bottleneck of LMMs directly by using sketching on the samples of the genotype matrix as well as sketching on the markers during the computation of the relatedness or kinship matrix (GRM). <br>
<br>
Predictive analytics have been used to improve healthcare by reinforcing decision-making, enhancing patient outcomes, and providing relief for the healthcare system. These methods help pave the way towards diagnosis, cure, and ultimately prediction and prevention of complex disorders. The prevalence of these complex diseases varies greatly around the world. Understanding the basis of this prevalence difference can help disentangle the interaction among different factors causing complex disorders and identify groups of people who may be at a greater risk of developing certain disorders. This could become the basis of the implementation of early intervention strategies for populations at higher risk with significant benefits for public health.<br>
<br>
This dissertation broadens our understanding of empirical population genetics. It proposes a data-driven perspective to a variety of problems in genetics such as confounding factors in genetic structure. This dissertation highlights current computational barriers in open problems in genetics and provides robust, scalable and efficient methods to ease the analysis of genotype data.</p>
|
923 |
Vibration-Based Health Monitoring of Rotating Systems with Gyroscopic EffectGavrilovic, Nenad 01 March 2015 (has links) (PDF)
This thesis focuses on the simulation of the gyroscopic effect using the software MSC Adams. A simple shaft-disk system was created and parameter of the sys-tem were changed in order to study the influence of the gyroscopic effect. It was shown that an increasing bearing stiffness reduces the precession motion. Fur-thermore, it was shown that the gyroscopic effect vanishes if the disk of system is placed symmetrically on the shaft, which reduces the system to a Jeffcott-Ro-tor. The second objective of this study was to analyze different defects in a simple fixed axis gear set. In particular, a cracked shaft, a cracked pinion and a chipped pinion as well as a healthy gear system were created and tested in Adams. The contact force between the two gears was monitored and the 2D and 3D frequency spectrum, as well as the Wavelet Transform, were plotted in order to compare the individual defects. It was shown that the Wavelet Transform is a powerful tool, capable of identifying a cracked gear with a non-constant speed. The last part of this study included fault detection with statistical methods as well as with the Sideband Energy Ratio (SER). The time domain signal of the individual faults were used to compare the mean, the standard deviation and the root mean square. Furthermore, the noise profile in the frequency spectrum was tracked with statistical methods using the mean and the standard deviation. It was demonstrated that it is possible to identify a cracked gear, as well as a chipped gear, with statistical methods. However, a cracked shaft could not be identified. The results also show that SER was only capable to identify major defects in a gear system such as a chipped tooth.
|
924 |
Expeditious Causal Inference for Big Observational DataYumin Zhang (13163253) 28 July 2022 (has links)
<p>This dissertation address two significant challenges in the causal inference workflow for Big Observational Data. The first is designing Big Observational Data with high-dimensional and heterogeneous covariates. The second is performing uncertainty quantification for estimates of causal estimands that are obtained from the application of black box machine learning algorithms on the designed Big Observational Data. The methodologies developed by addressing these challenges are applied for the design and analysis of Big Observational Data from a large public university in the United States. </p>
<h4>Distributed Design</h4>
<p>A fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the study prior to analysis. The design ensures that subjects in the different treatment groups that have comparable covariates are subclassified or matched together. Analyzing such a designed study helps to reduce biases arising from the confounding of covariates with treatment. Existing design methods, developed for traditional observational studies consisting of a single designer, can yield unsatisfactory designs with sub-optimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned covariates and the summaries they receive. The final design is selected by comparing balance measures for all covariates across the candidates and identifying the best amongst the candidates. We perform simulation studies and analyze datasets from the 2016 Atlantic Causal Inference Conference Data Challenge to demonstrate the flexibility and power of our framework for constructing designs with good covariate balance from Big Observational Data.</p>
<h4>Designed Bootstrap</h4>
<p>The combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the observational study prior to the application of machine learning algorithms. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational study. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life case study.</p>
<h4>Case Study</h4>
<p>We apply the distributed design and designed bootstrap methodologies in a case study involving institutional data from a large public university in the United States. The institutional data contains comprehensive information about the undergraduate students in the university, ranging from their academic records to on-campus activities. We study the causal effects of undergraduate students’ attempted course load on their academic performance based on a selection of covariates from these data. Ultimately, our real-life case study demonstrates how our methodologies enable researchers to effectively use straightforward design procedures to obtain valid causal inferences with reduced computational efforts from the application of machine learning algorithms on Big Observational Data.</p>
<p><br></p>
|
925 |
Strategic Designs for Online PlatformsWeilong Wang (13900263) 10 October 2022 (has links)
<p>Platforms are now everywhere in our society. Some platforms share real-time information such that people can refer to many aspects, i.e., transportation, weather, news, etc. For example, online learning platforms can play a significant role in accelerating learning through things like providing more real-time feedback loops. Due to the recent innovation in mobile devices as well as faster networks, live streaming platforms become a new trend. Several usages of live streaming platforms are gaming experience sharing such as Twitch, or shopping experience like Amazon Live. My dissertation studies the strategic designs of different online platforms, especially how information affects users’ strategic behaviors and how it creates<br>
different market outcomes.</p>
|
926 |
<b>Inquiry into Additionality in the Solar Policy Framework</b>Michael Liam Smith (18410295) 19 April 2024 (has links)
<p dir="ltr">An inquiry into the additionality of the income tax credit program for solar purchasing in Ohio, where aggregation electric purchasing programs exist.</p><p dir="ltr">In the State of Ohio, a unique feature of the electric market regulatory landscape permits local governments to become energy suppliers to their residents and small businesses through programs known as community choice aggregation (CCA). Some of these programs guarantee 100% renewable electricity to all enrollees. Concurrently, the federal government offers an income tax credit (ITC) for the purchase of a solar array. When policy incentives are offered, it is important to ensure they impact their target audience to act in ways that would not be observed in the scenario without the tax incentive. This is known as “additionality.” In the context of carbon emissions reduction goals, individuals who claim the ITC while already having 100% renewable electricity would violate additionality. In other words, these renewable aggregation programs may crowd out the benefits of the ITC. This paper seeks to assess the additionality of the ITC in the context of Ohio’s CCA program. The actual additionality can depend on whether renewable energy is already being supplied to the site that constructs a solar array. Hence, we study the relationship between CCA and solar adoption probability to determine whether tax incentives are additional. Using non-parametric survival analysis, panel data methods, and post-estimation simulations, this paper seeks to discern if additionality is violated using the ITC in areas where a supply of renewable energy is already guaranteed. We find that aggregation programs increase the probability of solar adoption and that on average, in Ohio, roughly $0.44 of every dollar spent on the income tax credit is non-additional. This will help policymakers determine the efficacy of funds allocated to their respective programs.</p>
|
927 |
Statistical analysis of clinical trial data using Monte Carlo methodsHan, Baoguang 11 July 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In medical research, data analysis often requires complex statistical methods where no closed-form solutions are available. Under such circumstances, Monte Carlo (MC) methods have found many applications. In this dissertation, we proposed several novel statistical models where MC methods are utilized. For the first part, we focused on semicompeting risks data in which a non-terminal event was subject to dependent censoring by a terminal event. Based on an illness-death multistate survival model, we proposed flexible random effects models. Further, we extended our model to the setting of joint modeling where both semicompeting risks data and repeated marker data are simultaneously analyzed. Since the proposed methods involve high-dimensional integrations, Bayesian Monte Carlo Markov Chain (MCMC) methods were utilized for estimation. The use of Bayesian methods also facilitates the prediction of individual patient outcomes. The proposed methods were demonstrated in both simulation and case studies.
For the second part, we focused on re-randomization test, which is a nonparametric method that makes inferences solely based on the randomization procedure used in clinical trials. With this type of inference, Monte Carlo method is often used for generating null distributions on the treatment difference. However, an issue was recently discovered when subjects in a clinical trial were randomized with unbalanced treatment allocation to two treatments according to the minimization algorithm, a randomization procedure frequently used in practice. The null distribution of the re-randomization test statistics was found not to be centered at zero, which comprised power of the test. In this dissertation, we investigated the property of the re-randomization test and proposed a weighted re-randomization method to overcome this issue. The proposed method was demonstrated through extensive simulation studies.
|
928 |
A comparison of the performance of three multivariate methods in investigating the effects of province and power usage on the amount of five power modes in South AfricaKanyama, Busanga Jerome 06 1900 (has links)
Researchers perform multivariate techniques MANOVA, discriminant analysis and factor analysis. The
most common applications in social science are to identify and test the effects from the analysis. The
use of this multivariate technique is uncommon in investigating the effects of power usage and Province
in South Africa on the amounts of the five power modes. This dissertation discusses this issue, the
methodology and practical problems of the three multivariate techniques. The author examines the
applications of each technique in social public research and comparisons are made between the three
multivariate techniques.
This dissertation concludes with a discussion of both the concepts of the present multivariate
techniques and the results found on the use of the three multivariate techniques in the energy
household consumption. The author recommends focusing on the hypotheses of the study or typical
questions surrounding of each technique to guide the researcher in choosing the appropriate analysis in
the social research, as each technique has some strengths and limitations. / Statistics / M. Sc. (Statistics)
|
929 |
ARIMA forecasts of the number of beneficiaries of social security grants in South AfricaLuruli, Fululedzani Lucy 12 1900 (has links)
The main objective of the thesis was to investigate the feasibility of accurately and precisely fore-
casting the number of both national and provincial bene ciaries of social security grants in South
Africa, using simple autoregressive integrated moving average (ARIMA) models. The series of the
monthly number of bene ciaries of the old age, child support, foster care and disability grants from
April 2004 to March 2010 were used to achieve the objectives of the thesis. The conclusions from
analysing the series were that: (1) ARIMA models for forecasting are province and grant-type spe-
ci c; (2) for some grants, national forecasts obtained by aggregating provincial ARIMA forecasts
are more accurate and precise than those obtained by ARIMA modelling national series; and (3)
for some grants, forecasts obtained by modelling the latest half of the series were more accurate
and precise than those obtained from modelling the full series. / Mathematical Sciences / M.Sc. (Statistics)
|
930 |
The development of a criminological intervention model for the Rosslyn industrial environment in Tshwane, Gauteng, South AfricaPretorius, William Lyon 02 1900 (has links)
The problem investigated in this research is the ongoing crime threat and the extreme risks which impact negatively on the sustainability of the Rosslyn Industry - the industrial hub of Tshwane in the Gauteng Provence of South Africa. Businesses in Rosslyn are desperate for a solution that will mitigate these crime threats and risks, and ensure the future sustainability of this important industrial community. An intervention model is urgently required to prevent this type of crime, not only as a short term solution but as a sustainable long term intervention.
This research study initiated the collaboration required for the successful implementation of a Crime Prevention Intervention Model (CPIM) in the Rosslyn industrial environment. The intended crime prevention model has been designed in such a way that it addresses the entire environment of crime that prevails in the Rosslyn area involving both the offender and the victim. This design is rooted in the ontology of Environmental Criminology and more specific on the applied epistemology of Crime Prevention Through Environmental Design (CPTED).
Participants in this project are representatives who are responsible for all security functions in both big businesses and small enterprises. And with their dedicated assistance the research findings disclosed the current crime status of the Rosslyn environment regarding the threat, risk, security vulnerabilities, controls and needs:
• Crime and its causal factors, in Rosslyn, are rife and no noteworthy action has been implemented to mitigate these threats.
• Collaboration between Rosslyn role players (neighbours, local government and law enforcement) is for all purposes non-existent.
• And to complicate matters even more, knowledge of how to effectively mitigate crime is limited and handicapped by the re-active physical security methods currently being used.
• The implication of these findings is that the status quo will eventually render business in Rosslyn unsustainable. Thus a CPIM in Rosslyn is inevitable.
What was crucial to this research and to the CTPED design is the detailed sourcing of accurate data addressing the experiences and the needs the respondents identified in the current Rosslyn crime situation concerning; status, the threat, risk, security, vulnerabilities and controls.
In order to achieve this level of data sourcing and assimilation, the essential features of the research method were based on a mixed approach where quantitative and qualitative methods were implemented in parallel. The diverse fields, sources and respondent mix required for a Rosslyn Industry CPIM also necessitated a MIT (Multi,-Inter,-Trans,-Disciplinary) approach. This MIT requirement is successfully facilitated through the applied criminological CPTED approach.
The CPIM is based on the combined outcomes of the following three research fields:
• Field-one: Environmental criminology theories are researched through an in-depth literature review to demonstrate the criminological grounding of crime prevention and to guide its application through the development of an applied CPTED SUITE.
• Field-two: Supply Chain Security (SCS) are researched through an in-depth literature review to establish its criminological relevance and applications. SCS requirements are identified and built into the Field-Three research process and tested for relevance and for incorporation in the CPTED SUITE.
• Field-three: Based on a mixed research process, using a custom designed Criminological Risk Analyses tool incorporating scheduled interviews and questionnaires, the crime and needs profile of the Rosslyn Industry are uncovered and analysed. The results are filtered through the CPTED SUITE to indicate the correct criminological approach for mitigating the identified problems and needs.
Even though this study takes an applied crime preventative approach, the criminological-philosophical mould of crime prevention is imperative for the effective application of the CPTED. Security and crime prevention training, planning and application, without this approach will remain underdeveloped and outdated.
Finally the underlying intention of this research is for this Crime Prevention Intervention Model (CPIM) to be adapted and implemented and to serve as a guide or a benchmark for security practitioners in any industrial environment that has the same crime threats and crime risk challenges. / Criminology and Security Science / D. Litt. et Phil. (Criminology)
|
Page generated in 0.0994 seconds