381 |
Risk evaluation techniques in a general insurance environmentVan den Heever, Rudolf Johannes 31 October 2005 (has links)
Please read the abstract in the section 00front of this document / Dissertation (MCom (Actuarial Science))--University of Pretoria, 2005. / Insurance and Actuarial Science / unrestricted
|
382 |
Automated statistical audit system for a government regulatory authorityXozwa, Thandolwethu January 2015 (has links)
Governments all over the world are faced with numerous challenges while running their countries on a daily basis. The predominant challenges which arise are those which involve statistical methodologies. Official statistics to South Africa’s infrastructure are very important and because of this it is important that an effort is made to reduce the challenges that occur during the development of official statistics. For official statistics to be developed successfully quality standards need to be built into an organisational framework and form a system of architecture (Statistics New Zealand 2009:1). Therefore, this study seeks to develop a statistical methodology that is appropriate and scientifically correct using an automated statistical system for audits in government regulatory authorities. The study makes use of Mathematica to provide guidelines on how to develop and use an automated statistical audit system. A comprehensive literature study was conducted using existing secondary sources. A quantitative research paradigm was adopted for this study, to empirically assess the demographic characteristics of tenants of Social Housing Estates and their perceptions towards the rental units they inhabit. More specifically a descriptive study was undertaken. Furthermore, a sample size was selected by means of convenience sampling for a case study on SHRA to assess the respondent’s biographical information. From this sample, a pilot study was conducted investigating the general perceptions of the respondents regarding the physical conditions and quality of their units. The technical development of an automated statistical audit system was discussed. This process involved the development and use of a questionnaire design tool, statistical analysis and reporting and how Mathematica software served as a platform for developing the system. The findings of this study provide insights on how government regulatory authorities can best utilise automated statistical audits for regulation purposes and achieved this by developing an automated statistical audit system for government regulatory authorities. It is hoped that the findings of this study will provide government regulatory authorities with practical suggestions or solutions regarding the generating of official statistics for regulatory purposes, and that the suggestions for future research will inspire future researchers to further investigate automated statistical audit systems, statistical analysis, automated questionnaire development, and government regulatory authorities individually.
|
383 |
Možnost zavedení a využívání metody SPC ve výrobě v organizaci s.n.o.p CZ, a.s. / The possibility of implementation and use of SPC methods in the production of organization s.n.o.p CZ, a.s.Kouba, Pavel January 2009 (has links)
The diploma paper is devoted to verification the application of SPC methods and performs evaluation of statistical stability and process eligibility of steel stampings in the real production process. In the second part is the author of the paper trying to design the optimal form of SPC methods for its use in a specified manufacturing process.
|
384 |
Statistická analýza ve webovém prostředí / Statistical Analysis in Web EnvironmentPostler, Štěpán January 2013 (has links)
The aim of this thesis is creating a web application that allows dataset import and analyzing data with use of statistical methods. The application uses a user access that allows multiple number of persons manipulate with a single dataset, as well as interact with each other. Data is stored on a remote server and application is accessible from any computer that is connected to the Internet. The application is created in PHP programming language with use of MySQL database system, and user interface is built in HTML language with use of CSS styles. All parts of application are stored on an attached CD in form of text files. In addition to the web application, a part of the thesis is also a text output, which contains a theoretical part in form of description of the chosen statistical analysis methods, and a practical part containing list of application's functions, data model's description and demonstration of data analysis options on specific examples.
|
385 |
Local parametric poisson models for fisheries dataYee, Irene Mei Ling January 1988 (has links)
Poisson process is a common model for count data. However, a global Poisson model is inadequate for sparse data such as the marked salmon recovery data that have huge extraneous variations and noise. An empirical Bayes model, which enables information to be aggregated to overcome the lack of information from data in individual cells, is thus developed to handle these data. The method fits a local parametric Poisson model to describe the variation at each sampling period and incorporates this approach with a conventional local smoothing technique to remove noise. Finally, the overdispersion relative to the Poisson model is modelled by mixing these locally smoothed, Poisson models in an appropriate way. This method is then applied to the marked salmon data to obtain the overall patterns and the corresponding credibility intervals for the underlying trend in the data. / Science, Faculty of / Statistics, Department of / Graduate
|
386 |
Essays on strategic trading, asymmetric information, and asset pricingPeterson, David John 05 1900 (has links)
This thesis presents three models of asset pricing involving non-competitive behavior and asymmetric
information. In the first model, a risk averse investor with private information about
dividends trades shares over an infinite time horizon with risk neutral uninformed agents. The
informed investor trades strategically in equilibrium. The second model also involves an infinite
time horizon, but all agents are risk averse and equally informed about dividends. Non-competitive
behavior is exogenously specified; price takers trade shares with a strategic investor
who accounts for the effects of her trades on the stock price. In this case, an endogenous information
asymmetry arises in equilibrium. Closed form equilibria are derived for both models and
implications for price dynamics are explored. While the first model constitutes a new extension
of the multiperiod Kyle model of insider trading, the second model generates more interesting
price dynamics. If the strategic investor manages a large mutual fund, significant risk premia
and price volatility may arise in equilibrium. In fact, if mutual fund participation is sufficiently
widespread, multiple equilibria may exist. The third model extends the multiperiod Kyle model
to a case where the insider observes a noisy signal of the stock's terminal liquidation value. An
equilibrium much like Kyle's is derived. Price tends toward value over time, and stock price
volatility depends on both the drift and volatility of the insider's private signal. Like the Kyle
model, the insider's trading activity leaves no detectable trace in trading volume, expected
returns, or price volatility. / Business, Sauder School of / Finance, Division of / Graduate
|
387 |
Les anisotropies du fond diffus infrarouge : un nouvel outil pour sonder l'évolution des structures / The anisotropies of the cosmic infrared backgrounda new tool to probe the evolution of structure : a new tool to probe the evolution of structurePenin, Aurelie 26 September 2011 (has links)
Le fond diffus infrarouge est la contribution de toutes les galaxies infrarouges intégrée sur toute l’histoire de l’Univers. Il émet entre 8 et 1000 µm et à un pic vers 200 µm. On résout une large fraction de ce fond dans l’infrarouge proche mais seule une petite fraction l’est dans l’infrarouge moyen et lointain à cause de la confusion. Les sources les plus faibles sont perdues dans le bruit de confusion. Cela forme des fluctuations de brillance, les anisotropies du fond diffus infrarouge. L’étude de ces fluctuations permet l’étude des galaxies sous le seuil de détection, donc des galaxies les plus faibles. Grâce au spectre de puissance on peut mesurer la puissance conte- nue dans ces fluctuations en fonction de l’échelle spatiale. Cette mesure contient, entre autre, le regroupement des galaxies infrarouges. Dans un premier temps, j’ai isolé du spectre de puissance d’une carte infrarouge, le spectre de puissance dû uniquement aux galaxies infrarouges. En effet, aux grandes échelles spatiales, il est contaminé par l’émission des cirrus Galactiques. Ces cirrus sont des nuages d’hydrogène neutre, tracés par la raie à 21 cm. J’ai donc utilisé des données à 21 cm pour estimer l’émission infrarouge de ces cirrus pour ensuite la soustraire aux cartes infrarouge à 100 et 160 µm. Cela m’a aussi permis de faire une mesure précise du niveau absolu du fond diffus infrarouge à ces longueurs d’onde. Afin d’analyser ces spectres de puissances, j’ai mis en place un modèle de regroupement des galaxies infrarouges reliant un modèle d’évolution des galaxies infrarouge reproduisant les données existantes dont celles d’Herschel et un modèle de halo. C’est un modèle complétement paramétré ce qui permet l’étude des dégénérescences de ces paramètres. J’en ai aussi tiré des mesures physiques et leur évolution avec la longueur d’onde. De plus, j’ai ajusté les données existantes de 100 à 1380 µm. Grâce au modèle on peut déterminer les contributions en redshift à chaque longueur d’onde. Les courtes longueurs d’onde tracent les bas redshifts alors que les grandes longueurs d’onde tracent les hauts redshifts. Cependant la contribution des bas redshifts est loin d’être négligeable à ces longueurs d’onde. Afin de déterminer l’évolution du regroupement avec le redshift des cartes des anisotropies du fond diffus infrarouge sont nécessaires. Je vais expliciter une méthode de séparation de composantes dédiée à cela. / The Cosmic Infrared Background is the contribution of all infrared galaxies integrated on the history of the Universe. It emits between 8 and 1000 um with a peak around 200 um. A large fraction of this background is resolved into sources in the near infrared but only a tiny fraction is in the mid and far infrared because of confusion. The least luminous sources are lost in the confusion noise which forms brightness fluctuations, the anisotropies of the cosmic infrared background. The study of these fluctuations enables the study of the galaxies below the detection threshold, thus the less luminous galaxies. Thanks to the power spectrum we can measure the power contained in these fluctuations as a function of the scale. This measure contains, among others, the clustering of the infrared galaxies. First, I have isolated from the power spectrum of an infrared map, the power spectrum only due to infrared galaxies. Indeed, at large spatial scales, it is contaminated by the emission of Galactic cirrus. These cirrus are clouds of neutral hydrogen traced by the 21 cm line. Therefore, I made use of data at 21 cm to estimate the infrared emission of these cirrus that I have then subtracted from infrared maps at 100 and 160 um.This has also enabled me to compute the absolute level of the cosmic infrared background at these wavelengths. In order to analyse these power spectra, I developped a model of clustering of infrared galaxies. To do so I linked a model of evolution of galaxies that reproduces very well existing data including those of Herschel and a halo model. This is a fully parametric model that enables the study of the degeneracies of these parameters. I was also able to study the evolution with the wavelength of several physical measures. Furthermore, I fitted data from 100 um to 1380 um. Thanks to that model, I can determine the redshift distribution at each wavelength. Short wavelength probe small redshifts whereas long wavelength probe high redshifts. However the contribution of small redshift is far from being negligeable at long wavelength. As a long term purpose of determining the evolution of the clustering if the infrared galaxies, maps of the anisotropies of the cosmic infrared background are needed. I will then detail a component separation method dedicated to this problem.
|
388 |
Advances in imbalanced data learningLu, Yang 29 August 2019 (has links)
With the increasing availability of large amount of data in a wide range of applications, no matter for industry or academia, it becomes crucial to understand the nature of complex raw data, in order to gain more values from data engineering. Although many problems have been successfully solved by some mature machine learning techniques, the problem of learning from imbalanced data continues to be one of the challenges in the field of data engineering and machine learning, which attracted growing attention in recent years due to its complexity. In this thesis, we focus on four aspects of imbalanced data learning and propose solutions to the key problems. The first aspect is about ensemble methods for imbalanced data classification. Ensemble methods, e.g. bagging and boosting, have the advantages to cure imbalanced data by integrated with sampling methods. However, there are still problems in the integration. One problem is that undersampling and oversampling are complementary each other and the sampling ratio is crucial to the classification performance. This thesis introduces a new method HSBagging which is based on bagging with hybrid sampling. Experiments show that HSBagging outperforms other state-of-the-art bagging method on imbalanced data. Another problem is about the integration of boosting and sampling for imbalanced data classification. The classifier weights of existing AdaBoost-based methods are inconsistent with the objective of class imbalance classification. In this thesis, we propose a novel boosting optimization framework GOBoost. This framework can be applied to any boosting-based method for class imbalance classification by simply replacing the calculation of classifier weights. Experiments show that the GOBoost-based methods significantly outperform the corresponding boosting-based methods. The second aspect is about online learning for imbalanced data stream with concept drift. In the online learning scenario, if the data stream is imbalanced, it will be difficult to detect concept drifts and adapt the online learner to them. The ensemble classifier weights are hard to adjust to achieve the balance between the stability and adaptability. Besides, the classier built on samples in size-fixed chunk, which may be highly imbalanced, is unstable in the ensemble. In this thesis, we propose Adaptive Chunk-based Dynamic Weighted Majority (ACDWM) to dynamically weigh the individual classifiers according to their performance on the current data chunk. Meanwhile, the chunk size is adaptively selected by statistical hypothesis tests. Experiments on both synthetic and real datasets with concept drift show that ACDWM outperforms both of the state-of-the-art chunk-based and online methods. In addition to imbalanced data classification, the third aspect is about clustering on imbalanced data. This thesis studies the key problem of imbalanced data clustering called uniform effect within the k-means-type framework, where the clustering results tend to be balanced. Thus, this thesis introduces a new method called Self-adaptive Multi-prototype-based Competitive Learning (SMCL) for imbalanced clusters. It uses multiple subclusters to represent each cluster with an automatic adjustment of the number of subclusters. Then, the subclusters are merged into the final clusters based on a novel separation measure. Experimental results show the efficacy of SMCL for imbalanced clusters and the superiorities against its competitors. Rather than a specific algorithm for imbalanced data learning, the final aspect is about a measure of class imbalanced dataset for classification. Recent studies have shown that imbalance ratio is not the only cause of the performance loss of a classifier in imbalanced data classification. To the best of our knowledge, there is no any measurement about the extent of influence of class imbalance on the classification performance of imbalanced data. Accordingly, this thesis proposes a data measure called Bayes Imbalance Impact Index (B1³) to reflect the extent of influence purely by the factor of imbalance for the whole dataset. As a result we can therefore use B1³ to judge whether it is worth using imbalance recovery methods like sampling or cost-sensitive methods to recover the performance loss of a classifier. The experiments show that B1³ is highly consistent with improvement of F1score made by the imbalance recovery methods on both synthetic and real benchmark datasets. Two ensemble frameworks for imbalanced data classification are proposed for sampling rate selection and boosting weight optimization, respectively. 2. •A chunk-based online learning algorithm is proposed to dynamically adjust the ensemble classifiers and select the chunk size for imbalanced data stream with concept drift. 3. •A multi-prototype competitive learning algorithm is proposed for clustering on imbalanced data. 4. •A measure of imbalanced data is proposed to evaluate how the classification performance of a dataset is influenced by the factor of imbalance.
|
389 |
Statistical Machine Learning & Deep Neural Networks Applied to Neural Data AnalysisShokri Razaghi, Hooshmand January 2020 (has links)
Computational neuroscience seeks to discover the underlying mechanisms by which neural activity is generated. With the recent advancement in neural data acquisition methods, the bottleneck of this pursuit is the analysis of ever-growing volume of neural data acquired in numerous labs from various experiments. These analyses can be broadly divided into two categories. First, extraction of high quality neuronal signals from noisy large scale recordings. Second, inference for statistical models aimed at explaining the neuronal signals and underlying processes that give rise to them. Conventionally, majority of the methodologies employed for this effort are based on statistics and signal processing. However, in recent years recruiting Artificial Neural Networks (ANN) for neural data analysis is gaining traction. This is due to their immense success in computer vision and natural language processing, and the stellar track record of ANN architectures generalizing to a wide variety of problems. In this work we investigate and improve upon statistical and ANN machine learning methods applied to multi-electrode array recordings and inference for dynamical systems that play critical roles in computational neuroscience.
In the first and second part of this thesis, we focus on spike sorting problem. The analysis of large-scale multi-neuronal spike train data is crucial for current and future of neuroscience research. However, this type of data is not available directly from recordings and require further processing to be converted into spike trains. Dense multi-electrode arrays (MEA) are standard methods for collecting such recordings. The processing needed to extract spike trains from these raw electrical signals is carried out by ``spike sorting'' algorithms. We introduce a robust and scalable MEA spike sorting pipeline YASS (Yet Another Spike Sorter) to address many challenges that are inherent to this task. We primarily pay attention to MEA data collected from the primate retina for important reasons such as the unique challenges and available side information that ultimately assist us in scoring different spike sorting pipelines. We also introduce a Neural Network architecture and an accompanying training scheme specifically devised to address the challenging task of deconvolution in MEA recordings.
In the last part, we shift our attention to inference for non-linear dynamics. Dynamical systems are the governing force behind many real world phenomena and temporally correlated data. Recently, a number of neural network architectures have been proposed to address inference for nonlinear dynamical systems. We introduce two different methods based on normalizing flows for posterior inference in latent non-linear dynamical systems. We also present gradient-based amortized posterior inference approaches using the auto-encoding variational Bayes framework that can be applied to a wide range of generative models with nonlinear dynamics. We call our method 𝘍𝘪𝘭𝘵𝘦𝘳𝘪𝘯𝘨 𝘕𝘰𝘳𝘮𝘢𝘭𝘪𝘻𝘪𝘯𝘨 𝘍𝘭𝘰𝘸𝘴 (FNF). FNF performs favorably against state-of-the-art inference methods in terms of accuracy of predictions and quality of uncovered codes and dynamics on synthetic data.
|
390 |
Essays on the use of probabilistic machine learning for estimating customer preferences with limited informationPadilla, Nicolas January 2021 (has links)
In this thesis, I explore in two essays how to augment thin historical purchase data with other sources of information using Bayesian and probabilistic machine learning frameworks to better infer customers' preferences and their future behavior. In the first essay, I posit that firms can better manage recently-acquired customers by using the information from acquisition to inform future demand preferences for those customers. I develop a probabilistic machine learning model based on Deep Exponential Families to relate multiple acquisition characteristics with individual level demand parameters, and I show that the model is able to capture flexibly non-linear relationships between acquisition behaviors and demand parameters. I estimate the model using data from a retail context and show that firms can better identify which new customers are the most valuable.
In the second essay, I explore how to combine the information collected through the customer journey—search queries, clicks and purchases; both within-journeys and across journeys—to infer the customer’s preferences and likelihood of buying, in settings in which there is thin purchase history and where preferences might change from one purchase journey to another.
I propose a non-parametric Bayesian model that combines these different sources of information and accounts for what I call context heterogeneity, which are journey-specific preferences that depend on the context of the specific journey. I apply the model in the context of airline ticket purchases using data from one of the largest travel search websites and show that the model is able to accurately infer preferences and predict choice in an environment characterized by very thin historical data. I find strong context heterogeneity across journeys, reinforcing the idea that treating all journeys as stemming from the same set of preferences may lead to erroneous inferences.
|
Page generated in 0.1027 seconds