931 |
A recommender system for e-retailWalwyn, Thomas January 2016 (has links)
The e-retail sector in South Africa has a significant opportunity to capture a large portion of the country's retail industry. Central to seizing this opportunity is leveraging the advantages that the online setting affords. In particular, the e-retailer can offer an extremely large catalogue of products; far beyond what a traditional retailer is capable of supporting. However, as the catalogue grows, it becomes increasingly difficult for a customer to efficiently discover desirable products. As a consequence, it is important for the e-retailer to develop tools that automatically explore the catalogue for the customer. In this dissertation, we develop a recommender system (RS), whose purpose is to provide suggestions for products that are most likely of interest to a particular customer. There are two primary contributions of this dissertation. First, we describe a set of six characteristics that all effective RS's should possess, namely; accuracy, responsiveness, durability, scalability, model management, and extensibility. Second, we develop an RS that is capable of serving recommendations in an actual e-retail environment. The design of the RS is an attempt to embody the characteristics mentioned above. In addition, to show how the RS supports model selection, we present a proof-of-concept experiment comparing two popular methods for generating recommendations that we implement for this dissertation, namely, implicit matrix factorisation (IMF) and Bayesian personalised ranking (BPR).
|
932 |
Enhanced minimum variance optimisation: a pragmatic approachLakhoo, Lala Bernisha Janti January 2016 (has links)
Since the establishment of Markowitz's theory, numerous studies have been carried out over the past six decades or so that cover the benefits, limitations, modifications and enhancements of Mean Variance (MV) optimisation. This study endeavours to extend on this, by means of adding factors to the minimum variance framework, which would increase the likelihood of outperforming both the market and the minimum variance portfolio (MVP). An analysis of the impact of these factor tilts on the MVP is carried out in the South African environment, represented by the FTSE-JSE Shareholder weighted Index as the benchmark portfolio. The main objective is to examine if the systematic and robust methods employed, which involve the incorporation of factor tilts into the multicriteria problem, together with covariance shrinkage – improve the performance of the MVP. The factor tilts examined include Active Distance, Concentration and Volume. Additionally, the constant correlation model is employed in the estimation of the shrinkage intensity, structured covariance target and shrinkage estimator. The results of this study showed that with specific levels of factor tilting, one can generally improve both absolute and risk-adjusted performance and lower concentration levels in comparison to both the MVP and benchmark. Additionally, lower turnover levels were observed across all tilted portfolios, relative to the MVP. Furthermore, covariance shrinkage enhanced all portfolio statistics examined, but significant improvement was noted on drawdown levels, capture ratios and risk. This is in contrast to the results obtained when the standard sample covariance matrix was employed.
|
933 |
The estimation of missing values in hydrological records using the EM algorithm and regression methodsMakhuvha, Tondani January 1988 (has links)
Includes bibliography. / The objective of this thesis is to review existing methods for estimating missing values in rainfall records and to propose a number of new procedures. Two classes of methods are considered. The first is based on the theory of variable selection in regression. Here the emphasis is on finding efficient methods to identify the set of control stations which are likely to yield the best regression estimates of the missing values in the target station. The second class of methods is based on the EM algorithm, proposed by Dempster, Laird and Rubin (1977). The emphasis here is to estimate the missing values directly without first making a detailed selection of control stations. All "relevant" stations are included. This method has not previously been applied in the context of estimating missing rainfall values.
|
934 |
Statistical Analysis of Linear Analog Circuits Using Gaussian Message Passing in Factor GraphsPhadnis, Miti 01 December 2009 (has links)
This thesis introduces a novel application of factor graphs to the domain of analog circuits. It proposes a technique of leveraging factor graphs for performing statistical yield analysis of analog circuits that is much faster than the standard Monte Carlo/Simulation Program With Integrated Circuit Emphasis (SPICE) simulation techniques. We have designed a tool chain to model an analog circuit and its corresponding factor graph and then use a Gaussian message passing approach along the edges of the graph for yield calculation. The tool is also capable of estimating unknown parameters of the circuit given known output statistics through backward message propagation in the factor graph. The tool builds upon the concept of domain-specific modeling leveraged for modeling and interpreting different kinds of analog circuits. Generic Modeling Environment (GME) is used to design modeling environment for analog circuits. It is a configurable tool set that supports creation of domain-specific design environments for different applications. This research has developed a generalized methodology that could be applied towards design automation of different kinds of analog circuits, both linear and nonlinear. The tool has been successfully used to model linear amplifier circuits and a nonlinear Metal Oxide Semiconductor Field Effect Transistor (MOSFET) circuit. The results obtained by Monte Carlo simulations performed on these circuits are used as a reference in the project to compare against the tool's results. The tool is tested for its efficiency in terms of time and accuracy against the standard results.
|
935 |
Assessment of Potential Changes in Crop Yields in the Central United States Under Climate Change RegimesMatthews-Pennanen, Neil 01 May 2018 (has links)
Climate change is one of the great challenges facing agriculture in the 21st century. The goal of this study was to produce projections of crop yields for the central United States in the 2030s, 2060s, and 2090s based on the relationship between weather and yield from historical crop yields from 1980 to 2010. These projections were made across 16 states in the US, from Louisiana in the south to Minnesota in the north. They include projections for maize, soybeans, cotton, spring wheat, and winter wheat.
Simulated weather variables based on three climate scenarios were used to project future crop yields. In addition, factors of soil characteristics, topography, and fertilizer application were used in the crop production models. Two technology scenarios were used: one simulating a future in which crop technology continues to improve and the other a future in which crop technology remains similar to where it is today.
Results showed future crop yields to be responsive to both the different climate scenarios and the different technology scenarios. The effects of a changing climate regime on crop yields varied both geographically throughout the study area and from crop to crop. One broad geographic trend was greater potential for crop yield losses in the south and greater potential for gains in the north.
Whether or not new technologies enable crop yields to continue to increase as the climate becomes less favorable is a major factor in agricultural production in the coming century. Results of this study indicate the degree to which society relies on these new technologies will be largely dependent on the degree of the warming that occurs.
Continued research into the potential negative impacts of climate change on the current crop system in the United States is needed to mitigate the widespread losses in crop productivity that could result. In addition to study of negative impacts, study should be undertaken with an interest to determine any potential new opportunities for crop development with the onset of higher temperatures as a result of climate change. Studies like this one with a broad geographic range should be complemented by studies of narrower scope that can manipulate climatic variables under controlled conditions. Investment into these types of agricultural studies will give the agricultural sector in the United States greater tools with which they can mitigate the disruptive effects of a changing climate.
|
936 |
Tuning Hyperparameters in Supervised Learning Models and Applications of Statistical Learning in Genome-Wide Association Studies with Emphasis on HeritabilityLundell, Jill F. 01 August 2019 (has links)
Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an R software package called EZtune that can be used to automatically tune three widely used machine learning algorithms: support vector machines, gradient boosting machines, and adaboost.
The second portion of this dissertation investigates the implementation of machine learning methods in finding locations along a genome that are associated with a trait. The performance of methods that have been commonly used for these types of studies, and some that have not been commonly used, are assessed using simulated data. The affect of the strength of the relationship between the genetic code and the trait is of particular interest. It was found that the strength of this relationship was the most important characteristic in the efficacy of each method.
|
937 |
An empirical evaluation of the Altman (1968) failure prediction model on South African JSE listed companiesRama, Kavir D. 18 March 2013 (has links)
Credit has become very important in the global economy (Cynamon and Fazzari, 2008).
The Altman (1968) failure prediction model, or derivatives thereof, are often used in the
identification and selection of financially distressed companies as it is recognized as one
of the most reliable in predicting company failure (Eidleman, 1995). Failure of a firm can
cause substantial losses to creditors and shareholders, therefore it is important, to detect
company failure as early as possible. This research report empirically tests the Altman
(1968) failure prediction model on 227 South African JSE listed companies using data
from the 2008 financial year to calculate the Z-score within the model, and measuring
success or failure of firms in the 2009 and 2010 years. The results indicate that the
Altman (1968) model is a viable tool in predicting company failure for firms with positive
Z-scores, and where Z-scores do not fall into the range of uncertainty as specified. The
results also suggest that the model is not reliable when the Z–scores are negative or
when they are in the range of uncertainty (between 2.99 and 1.81). If one is able to
predict firm failure in advance, it should be possible for management to take steps to
avert such an occurrence (Deakin, 1972; Keasey and Watson, 1991; Platt and Platt,
2002).
|
938 |
Is the way forward to step back? A meta-research analysis of misalignment between goals, methods, and conclusions in epidemiologic studies.Kezios, Katrina Lynn January 2021 (has links)
Recent discussion in the epidemiologic methods and teaching literatures centers around the importance of clearly stating study goals, disentangling the goal of causation from prediction (or description), and clarifying the statistical tools that can address each goal. This discussion illuminates different ways in which mismatches can occur between study goals, methods, and interpretations, which this dissertation synthesizes into the concept of “misalignment”; misalignment occurs when the study methods and/or interpretations are inappropriate for (i.e., do not match) the study’s goal. While misalignments can occur and may cause problems, their pervasiveness and consequences have not been examined in the epidemiologic literature. Thus, the overall purpose of this dissertation was to document and examine the effects of misalignment problems seen in epidemiologic practice.
First, a review was conducted to document misalignment in a random sample of epidemiologic studies and explore how the framing of study goals contributes to its occurrence. Among the reviewed articles, full alignment between study goals, methods, and interpretations was infrequently observed, although “clearly causal” studies (those that framed causal goals using causal language) were more often fully aligned (5/13, 38%) than “seemingly causal” ones (those that framed causal goals using associational language; 3/71, 4%).
Next, two simulation studies were performed to examine the potential consequences of different types of misalignment problems seen in epidemiologic practice. They are based on the observation that, often, studies that are causally motivated perform analyses that appear disconnected from, or “misaligned” with, their causal goal.
A primary aim of the first simulation study was to examine goal--methods misalignment in terms of inappropriate variable selection for exposure effect estimation (a causal goal). The main difference between predictive and causal models is the conceptualization and treatment of “covariates”. Therefore, exposure coefficients were compared from regression models built using different variable selection approaches that were either aligned (appropriate for causation) or misaligned (appropriate for prediction) with the causal goal of the simulated analysis. The regression models were characterized by different combinations of variable pools and inclusion criteria to select variables from the pools into the models. Overall, for valid exposure effect estimation in a causal analysis, the creation of the variable pool mattered more than the specific inclusion criteria, and the most important criterion when creating the variable pool was to exclude mediators.
The second simulation study concretized the misalignment problem by examining the consequences of goal--method misalignment in the application of the structured life course approach, a statistical method for distinguishing among different causal life course models of disease (e.g., critical period, accumulation of risk). Although exchangeability must be satisfied for valid results using this approach, in its empirical applications, confounding is often ignored. These applications are misaligned because they use methods for description (crude associations) for a causal goal (identifying causal processes). Simulations were used to mimic this misaligned approach and examined its consequences. On average, when life course data was generated under a “no confounding” scenario - an unlikely real-world scenario - the structured life course approach was quite accurate in identifying the life course model that generated the data. However, in the presence of confounding, the wrong underlying life course model was often identified. Five life course confounding structures were examined; as the complexity of examined confounding scenarios increased, particularly when this confounding was strong, incorrect model selection using the structured life course approach was common.
The misalignment problem is recognized but underappreciated in the epidemiologic literature. This dissertation contributes to the literature by documenting, simulating, and concretizing problems of misalignment in epidemiologic practice.
|
939 |
Estimation and the Stress-Strength ModelBrownstein, Naomi 01 January 2007 (has links)
The paper considers statistical inference for R = P(X < Y) in the case when both X and Y have generalized gamma distributions. The maximum likelihood estimators for R are developed in the case when either all three parameters of the generalized gamma distributions are unknown or when the shape parameters are known. In addition, objective Bayes estimators based on non informative priors are constructed when the shape parameters are known. Finally, the uniform minimum variance unbiased estimators (UMVUE) are derived in the case when only the scale parameters are unknown.
|
940 |
Marginal modelling of capture-recapture dataTurner, Elizabeth L. January 2007 (has links)
No description available.
|
Page generated in 0.1108 seconds