51 |
The comparison of item-based and trust-based CF in sparsity problemsWu, Chun-yi 02 August 2007 (has links)
With the dramatic growth of the Internet, it is much easier for us to acquire information than before. It is, however, relatively difficult to extract desired information through the huge information pool. One method is to rely on the search engines by analyzing the queried keywords to locate the relevant information. The other one is to recommend users what they may be interested in via recommender systems that analyze the users¡¦ past preferences or other users with similar interests to lessen our information processing loadings.
Typical recommendation techniques are classified into content-based filtering technique and collaborative filtering (CF) technique. Several research works in literature have indicated that the performance of collaborative filtering is superior to that of content-based filtering in that it is subject to neither the content format nor users¡¦ past experiences. The collaborative filtering technique, however, has its own limitation of the sparsity problem. To relieve such a problem, researchers proposed several CF-typed variants, including item-based CF and trust-based CF. Few works in literature, however, focus on their performance comparison. The objective of this research is thus to evaluate both approaches under different settings such as the sparsity degrees, data scales, and number of neighbors to make recommendations.
We conducted two experiments to examine their performance. The results show that trust-based CF is generally better than item-based CF in sparsity problem. Their difference, however, becomes insignificant with the sparsity decreasing. In addition, the computational time for trust-based CF increases more quickly than that for item-based CF, even though both exhibit exponential growths. Finally, the optimal number of nearest neighbors in both approaches does not heavily depend on the data scale but displays steady robustness.
|
52 |
Performance Analysis between Two Sparsity Constrained MRI Methods: Highly Constrained Backprojection(HYPR) and Compressed Sensing(CS) for Dynamic ImagingArzouni, Nibal 2010 August 1900 (has links)
One of the most important challenges in dynamic magnetic resonance imaging (MRI) is to achieve high spatial and temporal resolution when it is limited by system performance. It is desirable to acquire data fast enough to capture the dynamics in the image time series without losing high spatial resolution and signal to noise ratio. Many techniques have been introduced in the recent decades to achieve this goal. Newly developed algorithms like Highly Constrained Backprojection (HYPR) and Compressed Sensing (CS) reconstruct images from highly undersampled data using constraints. Using these algorithms, it is possible to achieve high temporal resolution in the dynamic image time series with high spatial resolution and signal to noise ratio (SNR). In this thesis we have analyzed the performance of HYPR to CS algorithm. In assessing the reconstructed image quality, we considered computation time, spatial resolution, noise amplification factors, and artifact power (AP) using the same number of views in both algorithms, and that number is below the Nyquist requirement. In the simulations performed, CS always provides higher spatial resolution than HYPR, but it is limited by computation time in image reconstruction and SNR when compared to HYPR. HYPR performs better than CS in terms of SNR and computation time when the images are sparse enough. However, HYPR suffers from streaking artifacts when it comes to less sparse image data.
|
53 |
Machine Learning Methods for Visual Object DetectionHussain, Sibt Ul 07 December 2011 (has links) (PDF)
The goal of this thesis is to develop better practical methods for detecting common object classes in real world images. We present a family of object detectors that combine Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) features with efficient Latent SVM classifiers and effective dimensionality reduction and sparsification schemes to give state-of-the-art performance on several important datasets including PASCAL VOC2006 and VOC2007, INRIA Person and ETHZ. The three main contributions are as follows. Firstly, we pioneer the use of Local Ternary Pattern features for object detection, showing that LTP gives better overall performance than HOG and LBP, because it captures both rich local texture and object shape information while being resistant to variations in lighting conditions. It thus works well both for classes that are recognized mainly by their structure and ones that are recognized mainly by their textures. We also show that HOG, LBP and LTP complement one another, so that an extended feature set that incorporates all three of them gives further improvements in performance. Secondly, in order to tackle the speed and memory usage problems associated with high-dimensional modern feature sets, we propose two effective dimensionality reduction techniques. The first, feature projection using Partial Least Squares, allows detectors to be trained more rapidly with negligible loss of accuracy and no loss of run time speed for linear detectors. The second, feature selection using SVM weight truncation, allows active feature sets to be reduced in size by almost an order of magnitude with little or no loss, and often a small gain, in detector accuracy. Despite its simplicity, this feature selection scheme outperforms all of the other sparsity enforcing methods that we have tested. Lastly, we describe work in progress on Local Quantized Patterns (LQP), a generalized form of local pattern features that uses lookup table based vector quantization to provide local pattern style pixel neighbourhood codings that have the speed of LBP/LTP and some of the flexibility and power of traditional visual word representations. Our experiments show that LQP outperforms all of the other feature sets tested including HOG, LBP and LTP.
|
54 |
Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression DataMayrink, Vinicius Diniz January 2011 (has links)
<p>An important problem in the analysis of gene expression data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, is still highly co-expressed in another data set. For some microarray platforms there are many, relatively short, probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted transcript, but rather a different gene with a similar region (called cross-hybridization). Similarly, the incorrect mapping of short nucleotide sequences to a target gene is a common issue related to the young technology producing RNA-Seq data. The expression pattern across samples is a valuable source of information, which can be used to address distinct problems through the application of factor models. Our first study is focused on the identification of the presence/absence status of a gene in a sample. We compare our factor model to state-of-the-art detection methods; the results suggest superior performance of the factor analysis for detecting transcripts. In the second study, we apply factor models to investigate gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. Copy number alteration is detected for a group of genes in breast cancer; our goal is to examine this abnormality in the same chromosomal region for other types of tumors (Ovarian, Lung and Brain). In the third application, the expression pattern related to RNA-Seq count data is evaluated through a factor model based on the Poisson distribution. Here, the presence/absence of coherent patterns is closely associated with the number of incorrect read mappings. The final study of this dissertation is dedicated to the analysis of multi-factor models with linear and non-linear structure of interactions between latent factors. The interaction terms can have important implications in the model; they represent relationships between genes which cannot be captured in an ordinary analysis.</p> / Dissertation
|
55 |
Sparse Modeling in Classification, Compression and DetectionChen, Jihong 12 July 2004 (has links)
The principal focus of this thesis is the exploration of sparse structures in a variety of statistical modelling problems. While more comprehensive models can be useful to solve a larger number of problems, its calculation may be ill-posed in most practical instances because of the sparsity of informative features in the data. If this sparse structure can be exploited, the models can often be solved very efficiently.
The thesis is composed of four projects. Firstly, feature sparsity is incorporated to improve the performance of support vector machines when there are a lot of noise features present. The second project is about an empirical study on how to construct an optimal cascade structure. The third project involves the design of a progressive, rate-distortionoptimized shape coder by combining zero-tree algorithm with beamlet structure. Finally,
the longest run statistics is applied for the detection of a filamentary structure in twodimensional rectangular region.
The fundamental ideas of the above projects are common — extract an efficient summary from a large amount of data. The main contributions of this work are to develop and implement novel techniques for the efficient solutions of several dicult problems that arise in statistical signal/image processing.
|
56 |
Principal Components Analysis for Binary DataLee, Seokho 2009 May 1900 (has links)
Principal components analysis (PCA) has been widely used as a statistical tool for the dimension
reduction of multivariate data in various application areas and extensively studied
in the long history of statistics. One of the limitations of PCA machinery is that PCA can be
applied only to the continuous type variables. Recent advances of information technology
in various applied areas have created numerous large diverse data sets with a high dimensional
feature space, including high dimensional binary data. In spite of such great demands,
only a few methodologies tailored to such binary dataset have been suggested. The
methodologies we developed are the model-based approach for generalization to binary
data. We developed a statistical model for binary PCA and proposed two stable estimation
procedures using MM algorithm and variational method. By considering the regularization
technique, the selection of important variables is automatically achieved. We also proposed
an efficient algorithm for model selection including the choice of the number of principal
components and regularization parameter in this study.
|
57 |
Item-level Trust-based Collaborative Filtering Approach to Recommender SystemsLu, Chia-Ju 23 July 2008 (has links)
With the rapid growth of Internet, more and more information is disseminated in the World Wide Web. It is therefore not an easy task to acquire desired information from the Web environment due to the information overload problem. To overcome this difficulty, two major methods, information retrieval and information filtering, arise. Recommender systems that employ information filtering techniques also emerge when the users¡¦ requirements are too vague in mind to express explicitly as keywords.
Collaborative filtering (CF) refers to compare novel information with common interests shared by a group of people for recommendation purpose. But CF has major problem: sparsity. This problem refers to the situation that the coverage of ratings appears very sparse. With few data available, the user similarity employed in CF becomes unstable and thus unreliable in the recommendation process. Recently, several collaborative filtering variations arise to tackle the sparsity problem. One of them refers to the item-based CF as opposed to the traditional user-based CF. This approach focuses on the correlations of items based on users¡¦ co-rating. Another popular variation is the trust-based CF. In such an approach, a second component, trust, is taken into account and employed in the recommendation process.
The objective of this research is thus to propose a hybrid approach that takes both advantages into account for better performance. We propose the item-level trust-based collaborative filtering (ITBCF) approach to alleviate the sparsity problem. We observe that ITBCF outperforms TBCF in every situation we consider. It therefore confirms our conjecture that the item-level trusts that consider neighbors can stabilize derived trust values, and thus improve the performance.
|
58 |
Multiframe Superresolution Techniques For Distributed Imaging SystemsShankar, Premchandra M. January 2008 (has links)
Multiframe image superresolution has been an active research area for many years. In this approach image processing techniques are used to combine multiple low-resolution (LR) images capturing different views of an object. These multiple images are generally under-sampled, degraded by optical and pixel blurs, and corrupted by measurement noise. We exploit diversities in the imaging channels, namely, the number of cameras, magnification, position, and rotation, to undo degradations. Using an iterative back-projection (IBP) algorithm we quantify the improvements in image fidelity gained by using multiple frames compared to single frame, and discuss effects of system parameters on the reconstruction fidelity. As an example, for a system in which the pixel size is matched to optical blur size at a moderate detector noise, we can reduce the reconstruction root-mean-square-error by 570% by using 16 cameras and a large amount of diversity in deployment.We develop a new technique for superresolving binary imagery by incorporating finite-alphabet prior knowledge. We employ a message-passing based algorithm called two-dimensional distributed data detection (2D4) to estimate the object pixel likelihoods. We present a novel complexity-reduction technique that makes the algorithm suitable even for channels with support size as large as 5x5 object pixels. We compare the performance and complexity of 2D4 with that of IBP. In an imaging system with an optical blur spot matched to pixel size, and four 2x2 undersampled LR images, the reconstruction error for 2D4 is 300 times smaller than that for IBP at a signal-to-noise ratio of 38dB.We also present a transform-domain superresolution algorithm to efficiently incorporate sparsity as a form of prior knowledge. The prior knowledge that the object is sparse in some domain is incorporated in two ways: first we use the popular L1 norm as the regularization operator. Secondly we model wavelet coefficients of natural objects using generalized Gaussian densities. The model parameters are learned from a set of training objects and the regularization operator is derived from these parameters. We compare the results from our algorithms with an expectation-maximization (EM) algorithm for L1 norm minimization and also with the linear minimum mean squared error (LMMSE) estimator.
|
59 |
Sparse coding for machine learning, image processing and computer visionMairal, Julien 30 November 2010 (has links) (PDF)
We study in this thesis a particular machine learning approach to represent signals that that consists of modelling data as linear combinations of a few elements from a learned dictionary. It can be viewed as an extension of the classical wavelet framework, whose goal is to design such dictionaries (often orthonormal basis) that are adapted to natural signals. An important success of dictionary learning methods has been their ability to model natural image patches and the performance of image denoising algorithms that it has yielded. We address several open questions related to this framework: How to efficiently optimize the dictionary? How can the model be enriched by adding a structure to the dictionary? Can current image processing tools based on this method be further improved? How should one learn the dictionary when it is used for a different task than signal reconstruction? How can it be used for solving computer vision problems? We answer these questions with a multidisciplinarity approach, using tools from statistical machine learning, convex and stochastic optimization, image and signal processing, computer vision, but also optimization on graphs.
|
60 |
Irregular sampling: from aliasing to noiseHennenfent, Gilles, Herrmann, Felix J. January 2007 (has links)
Seismic data is often irregularly and/or sparsely sampled along spatial coordinates. We show that these acquisition geometries are not necessarily a source of adversity in order to accurately reconstruct adequately-sampled data. We use two examples to illustrate that it may actually be better than equivalent regularly subsampled data. This comment was already made in earlier works by other authors. We explain this behavior by two key observations. Firstly, a noise-free underdetermined problem can be seen as a noisy well-determined problem. Secondly, regularly subsampling creates strong coherent acquisition noise (aliasing) difficult to remove unlike the noise created by irregularly subsampling that is typically weaker and Gaussian-like
|
Page generated in 0.05 seconds