131 |
Hellinger Distance-based Similarity Measures for Recommender Systems / Hellinger distance-baserad similaritetsmått for rekommendationsystemGoussakov, Roma January 2020 (has links)
Recommender systems are used in online sales and e-commerce for recommending potential items/products for customers to buy based on their previous buying preferences and related behaviours. Collaborative filtering is a popular computational technique that has been used worldwide for such personalized recommendations. Among two forms of collaborative filtering, neighbourhood and model-based, the neighbourhood-based collaborative filtering is more popular yet relatively simple. It relies on the concept that a certain item might be of interest to a given customer (active user) if, either he appreciated similar items in the buying space, or if the item is appreciated by similar users (neighbours). To implement this concept different kinds of similarity measures are used. This thesis is set to compare different user-based similarity measures along with defining meaningful measures based on Hellinger distance that is a metric in the space of probability distributions. Data from a popular database MovieLens will be used to show the effectiveness of dierent Hellinger distance-based measures compared to other popular measures such as Pearson correlation (PC), cosine similarity, constrained PC and JMSD. The performance of dierent similarity measures will then be evaluated with the help of mean absolute error, root mean squared error and F-score. From the results, no evidence were found to claim that Hellinger distance-based measures performed better than more popular similarity measures for the given dataset.
|
132 |
N-sphere Clustering / N-sfär klustringPahmp, Oliver January 2020 (has links)
This thesis introduces n-sphere clustering, a new method of cluster analysis, akin to agglomerative hierarchical clustering. It relies on expanding n-spheres around each observation until they intersect. It then clusters observations based on these intersects, the distance between the spheres, and density of observations. Currently, many commonly used clustering methods struggle when clusters have more complex shapes. The aim of n-sphere clustering is to have a method which functions reasonably well, regardless of the shape of the clusters. Accuracy is shown to be low, particularly when clusters overlap, and extremely sensitive to noise. The time complexity of the algorithm is prohibitively large for large datasets, further limiting its potential use.
|
133 |
Conditional mean variables: A method for estimating latent linear relationships with discretized observationsBerggren, Mathias January 2020 (has links)
No description available.
|
134 |
A small sample study of some sandwich estimators to handle heteroscedasticityWestman, Viking January 2021 (has links)
This simulation study sets out to investigate Heteroscedasticity-Consistent Covariance Matrix Estimation using the sandwich method in relatively small sample sizes. The different estimators are evaluated on how accurately they assign confidence intervals around a fixed, true coefficient, in the presence of random sampling and both homo- and heteroscedasticity. A measure of Standard Error is also collected to further analyze the coefficients. All of the HC-estimators seemed to overadjust in most homoscedastic cases, creating intervals that way overlapped their specifications, and the standard procedure that assumes homoscedasticity produced the most consistent intervals towards said specifications. In the presence of heteroscedasticity the comparative accuracy improved for the HC-estimators and they were often better than the non-robust error estimator with the exception of estimating the intercept, which they all heavily underestimated the confidence intervals for. In turn, the constant estimator was subject to a larger mean error for said parameter - the intercept. While it is clear from previous studies that Sandwich estimation is a method that can lead to more accurate results, it was rarely much better than, and sometimes strictly worse than the non-robust, constant variance provided by the OLS-estimation. The conclusion is to stay cautious when applying HC-estimators to your model, and to test and make sure that they do in fact improve the areas where heteroscedasticity presents an issue.
|
135 |
History and properties of random recursive treesWikström, Victor January 2020 (has links)
No description available.
|
136 |
Majority bootstrap percolation and paths in G(n, p)Lundblad, Jacob January 2021 (has links)
No description available.
|
137 |
Fitting Yield Curve with DynamicNelson-Siegel Models: Evidence from SwedenHuang, Zhe January 2021 (has links)
No description available.
|
138 |
Comparison of multiple imputation methods for missing data : A simulation studySchelhaas, Sjoerd January 2021 (has links)
Despite a well-designed and controlled study, missing values are consistently present inresearch. It is well established that when disregarding missingness by analyzing completecases only, statistical power is reduced and parameter estimates are biased. The existing traditional methods of imputing missing data are incapable of accounting for misleading representation of data. Research shows that these traditional methods like single imputation, often underestimate the variance. This problem can be bypassed by imputing a missing value multiple times and taking the uncertainty of imputing correctly into consideration. In this thesis a simulation study is conducted to compare two different multiple imputation models. A comparison between a defined linear stochastic regression model and a non defined flexible neural network model, where the validation MSE loss is used to account for variance in the imputed values, is done. In total there are three simulated data sets sampled from a multiple bivariate linear regression model where som of the values in Y2 are MAR given the Y1 variable. When applying a neural network on the datasets with 25, 50 and 75 percent missing values a total of 30 times and the result from the regression analysis on the complete data is pooled, the results show that almost all confidence intervals of the intercept are covering the expected value. The only exception was in the case of 75 percent missingness. When applying Multiple imputation by chained equations on the data sets, the true intercept is covered by all confidence intervals. When 25 percent of the data is missing, both models yield unbiased results.
|
139 |
Large deviations of condition numbers of random matricesUwamariya, Denise January 2021 (has links)
Random matrix theory has found many applications in various fields such as physics, statistics, number theory and so on. One important approach of studying random matrices is based on their spectral properties. In this thesis, we investigate the limiting behaviors of condition numbers of suitable random matrices in terms of large deviations. The thesis is divided into two parts. Part I, provides to the readers an short introduction on the theory of large deviations, some spectral properties of random matrices, and a summary of the results we derived, and in Part II, two papers are appended. In the first paper, we study the limiting behaviors of the 2-norm condition number of p x n random matrix in terms of large deviations for large n and p being fixed or p = p(n) → ∞ with p(n) = o(n). The entries of the random matrix are assumed to be i.i.d. whose distribution is quite general (namely sub- Gaussian distribution). When the entries are i.i.d. normal random variables, we even obtain an application in statistical inference. The second paper deals with the β-Laguerre (or Wishart) ensembles with a general parameter β > 0. There are three special cases β = 1, β = 2 and β = 4 which are called, separately, as real, complex and quaternion Wishart matrices. In the paper, large deviations of the condition number are achieved as n → ∞, while p is either fixed or p = p(n) → ∞ with p(n) = o(n/ln(n)).
|
140 |
Structural properties of problems in sequential testing and detectionWang, Yuqiong January 2021 (has links)
No description available.
|
Page generated in 0.0632 seconds