11 |
Clickstream AnalysisKliegr, Tomáš January 2007 (has links)
Thesis introduces current research trends in clickstream analysis and proposes a new heuristic that could be used for dimensionality reduction of semantically enriched data in Web Usage Mining (WUM). Click-fraud and conversion fraud are identified as key prospective application areas for WUM. Thesis documents a conversion fraud vulnerability of Google Analytics and proposes defense - a new clickstream acquisition software, which collects data in sufficient granularity and structure to allow for data mining approaches to fraud detection. Three variants of K-means clustering algorithms and three association rule data mining systems are evaluated and compared on real-world web usage data.
|
12 |
Understanding interactive multidimensional projections / Compreendendo projeções multidimensionais interativasFadel, Samuel Gomes 14 October 2016 (has links)
The large amount of available data on a diverse range of human activities provides many opportunities for understanding, improving and revealing unknown patterns in them. Powerful automatic methods for extracting this knowledge from data are already available from machine learning and data mining. They, however, rely on the expertise of analysts to improve their results when those are not satisfactory. In this context, interactive multidimensional projections are a useful tool for the analysis of multidimensional data by revealing their underlying structure while allowing the user to manipulate the results to provide further insight into this structure. This manipulation, however, has received little attention regarding their influence on the mappings, as they can change the final layout in unpredictable ways. This is the main motivation for this research: understanding the effects caused by changes in these mappings. We approach this problem from two perspectives. First, the user perspective, we designed and developed visualizations that help reduce the trial and error in this process by providing the right piece of information for performing manipulations. Furthermore, these visualizations help explain the changes in the map caused by such manipulations. Second, we defined the effectiveness of manipulation in quantitative terms, then developed an experimental framework for assessing manipulations in multidimensional projections under this view. This framework is based on improving mappings using known evaluation measures for these techniques. Using the improvement of measures as different types of manipulations, we perform a series of experiments on five datasets, five measures, and four techniques. Our experimental results show that there are possible types of manipulations that can happen effectively, with some techniques being more susceptible to manipulations than others. / O grande volume de dados disponíveis em uma diversa gama de atividades humanas cria várias oportunidades para entendermos, melhorarmos e revelarmos padrões previamente desconhecidos em tais atividades. Métodos automáticos para extrair esses conhecimentos a partir de dados já existem em áreas como aprendizado de máquina e mineração de dados. Entretanto, eles dependem da perícia do analista para obter melhores resultados quando estes não são satisfatórios. Neste contexto, técnicas de projeção multidimensional interativas são uma ferramenta útil para a análise de dados multidimensionais, revelando sua estrutura subjacente ao mesmo tempo que permite ao analista manipular os resultados interativamente, estendendo o processo de exploração. Essa interação, entretanto, não foi estudada com profundidade com respeito à sua real influência nos mapeamentos, já que podem causar mudanças não esperadas no mapeamento final. Essa é a principal motivação desta pesquisa: entender os efeitos causados pelas mudanças em tais mapeamentos. Abordamos o problema de duas perspectivas. Primeiro, da perspectiva do usuário, desenvolvemos visualizações que ajudam a diminuir tentativas e erros neste processo provendo a informação necessária a cada passo da interação. Além disso, essas visualizações ajudam a explicar as mudanças causadas no mapeamento pela manipulação. A segunda perspectiva é a efetividade da manipulação. Definimos de forma quantitativa a efetividade da manipulação, e então desenvolvemos um arcabouço para avaliar manipulações sob a visão da efetividade. Este arcabouço é baseado em melhorias nos mapeamentos usando medidas de avaliação conhecidas para tais técnicas. Usando tais melhorias como diferentes formas de manipulação, realizamos uma série de experimentos em cinco bases de dados, cinco medidas e quatro técnicas. Nossos resultados experimentais nos dão evidências que existem certos tipos de manipulação que podem acontecer efetivamente, com algumas técnicas sendo mais suscetíveis a manipulações do que outras.
|
13 |
Development of analysis approaches to calcium-imaging data of hippocampal neurons associated with classical conditioning in miceYao, Zhaojie 05 November 2016 (has links)
Recent improvements in high performance fluorescent sensors and scientific CMOS cameras enable optical imaging of neural networks at a much larger scale. Our lab has demonstrated the ability of wide-field calcium-imaging (using GCaMP6f) to capture the concurrent dynamic activity from hundreds to thousands of neurons over millimeters of brain tissue in behaving mice. The expansiveness of the neuronal network captured by the system requires innovation in data analysis methods. This thesis explores data analysis techniques to extract dynamics of hippocampal neural network containing a large number of individual neurons recorded using GCaMP6, while mice were learning a classical eye puff conditioning behavior.
GCaMP6 fluorescence signals in each neuron is first considered one dimension, and each dataset thus contains hundreds to thousands dimensions. To understand the network structure, we first performed dimension reduction technique to examine the low-dimension evolution of the neural trajectory using Gaussian Process Factor Analysis, which smooths across dimensions, while extracting the low dimension representation. Because of the slow time course of GCaMP6 signals, the Factor Analysis was biased to the long lasting decay phase of the signal that does not represent neural activities. We found that it is critical to first estimate the spike train inference prior to application of dimension reduction, such as using the Fast Nonnegative Deconvolution method. While the low-dimension presentation described intriguing features in the neural trajectories that paralleled the learning behavior of the animal, to further quantify the network changes we directly examined the network in the high dimension space. We calculated the changes in the distance of the network trajectory over time in the high dimension space without any filtering, and compared across different phases of the behavioral states. We found that the speed of the trajectory in the high dimension space is significantly higher when animal learned the task, and the trajectory travelled much further away from baseline during the delay phase of the conditioning behavior. Together, these results demonstrate that dimension reduction analysis technique and the network trajectory within the non-reduced high dimension space can capture evolving features of neural networks recorded using calcium imaging. While this thesis concerns the hippocampal dynamics during learning, such data analysis techniques are expected to be broadly applicable to other behaviorally relevant networks.
|
14 |
Recognising, Representing and Mapping Natural Features in Unstructured EnvironmentsRamos, Fabio Tozeto January 2008 (has links)
Doctor of Philosophy (PhD) / This thesis addresses the problem of building statistical models for multi-sensor perception in unstructured outdoor environments. The perception problem is divided into three distinct tasks: recognition, representation and association. Recognition is cast as a statistical classification problem where inputs are images or a combination of images and ranging information. Given the complexity and variability of natural environments, this thesis investigates the use of Bayesian statistics and supervised dimensionality reduction to incorporate prior information and fuse sensory data. A compact probabilistic representation of natural objects is essential for many problems in field robotics. This thesis presents techniques for combining non-linear dimensionality reduction with parametric learning through Expectation Maximisation to build general representations of natural features. Once created these models need to be rapidly processed to account for incoming information. To this end, techniques for efficient probabilistic inference are proposed. The robustness of localisation and mapping algorithms is directly related to reliable data association. Conventional algorithms employ only geometric information which can become inconsistent for large trajectories. A new data association algorithm incorporating visual and geometric information is proposed to improve the reliability of this task. The method uses a compact probabilistic representation of objects to fuse visual and geometric information for the association decision. The main contributions of this thesis are: 1) a stochastic representation of objects through non-linear dimensionality reduction; 2) a landmark recognition system using a visual and ranging sensors; 3) a data association algorithm combining appearance and position properties; 4) a real-time algorithm for detection and segmentation of natural objects from few training images and 5) a real-time place recognition system combining dimensionality reduction and Bayesian learning. The theoretical contributions of this thesis are demonstrated with a series of experiments in unstructured environments. In particular, the combination of recognition, representation and association algorithms is applied to the Simultaneous Localisation and Mapping problem (SLAM) to close large loops in outdoor trajectories, proving the benefits of the proposed methodology.
|
15 |
The Role of Knowledge in Visual Shape RepresentationSaund, Eric 01 October 1988 (has links)
This report shows how knowledge about the visual world can be built into a shape representation in the form of a descriptive vocabulary making explicit the important geometrical relationships comprising objects' shapes. Two computational tools are offered: (1) Shapestokens are placed on a Scale-Space Blackboard, (2) Dimensionality-reduction captures deformation classes in configurations of tokens. Knowledge lies in the token types and deformation classes tailored to the constraints and regularities ofparticular shape worlds. A hierarchical shape vocabulary has been implemented supporting several later visual tasks in the two-dimensional shape domain of the dorsal fins of fishes.
|
16 |
Cluster techniques and prediction models for a digital media learning environmentFernandez Espinosa, Arturo 01 August 2012 (has links)
The present work applies well-known data mining techniques in a digital learning
media environment in order to identify groups of students based on their pro le. We
generate identi able clusters where some interesting patterns and rules are observed.
We generate a neural network predictive model intended to predict the success of the
students in the digital media learning environment. One of the goals of this study is to
identify a subset of variables that have the biggest impact in student performance with
respect to the learning assessments of the digital media learning environment. Three
approaches are used to perform the dimensionality reduction of our dataset.
The experiments were conducted with over 69 students of health science courses
who used the digital media learning environment. / UOIT
|
17 |
A Nonlinear Framework for Facial AnimationBastani, Hanieh 25 July 2008 (has links)
This thesis researches techniques for modelling static facial expressions, as well as the dynamics of continuous
facial motion. We demonstrate how static and dynamic properties of facial expressions can be represented within a linear
and nonlinear context, respectively. These two representations do not act in isolation, but are mutually reinforcing in
conceding a cohesive framework for the analysis, animation, and manipulation of expressive faces. We derive a basis for
the linear space of expressions through Principal Components Analysis (PCA). We introduce and formalize the notion
of "expression manifolds", manifolds residing in PCA space that model motion dynamics for semantically similar expressions.
We then integrate these manifolds into an animation workflow by performing Nonlinear Dimensionality Reduction (NLDR) on the
expression manifolds. This operation yields expression maps that encode a wealth of information relating
to complex facial dynamics, in a low dimensional space that is intuitive to navigate and efficient to manage.
|
18 |
A Nonlinear Framework for Facial AnimationBastani, Hanieh 25 July 2008 (has links)
This thesis researches techniques for modelling static facial expressions, as well as the dynamics of continuous
facial motion. We demonstrate how static and dynamic properties of facial expressions can be represented within a linear
and nonlinear context, respectively. These two representations do not act in isolation, but are mutually reinforcing in
conceding a cohesive framework for the analysis, animation, and manipulation of expressive faces. We derive a basis for
the linear space of expressions through Principal Components Analysis (PCA). We introduce and formalize the notion
of "expression manifolds", manifolds residing in PCA space that model motion dynamics for semantically similar expressions.
We then integrate these manifolds into an animation workflow by performing Nonlinear Dimensionality Reduction (NLDR) on the
expression manifolds. This operation yields expression maps that encode a wealth of information relating
to complex facial dynamics, in a low dimensional space that is intuitive to navigate and efficient to manage.
|
19 |
Semidefinite Embedding for the Dimensionality Reduction of DNA Microarray DataKharal, Rosina January 2006 (has links)
Harnessing the power of DNA microarray technology requires the existence of analysis methods that accurately interpret microarray data. Current literature abounds with algorithms meant for the investigation of microarray data. However, there is need for an efficient approach that combines different techniques of microarray data analysis and provides a viable solution to dimensionality reduction of microarray data. Reducing the high dimensionality of microarray data is one approach in striving to better understand the information contained within the data. We propose a novel approach for dimensionality reduction of microarray data that effectively combines different techniques in the study of DNA microarrays. Our method, <strong><em>KAS</em></strong> (<em>kernel alignment with semidefinite embedding</em>), aids the visualization of microarray data in two dimensions and shows improvement over existing dimensionality reduction methods such as PCA, LLE and Isomap.
|
20 |
Semidefinite Embedding for the Dimensionality Reduction of DNA Microarray DataKharal, Rosina January 2006 (has links)
Harnessing the power of DNA microarray technology requires the existence of analysis methods that accurately interpret microarray data. Current literature abounds with algorithms meant for the investigation of microarray data. However, there is need for an efficient approach that combines different techniques of microarray data analysis and provides a viable solution to dimensionality reduction of microarray data. Reducing the high dimensionality of microarray data is one approach in striving to better understand the information contained within the data. We propose a novel approach for dimensionality reduction of microarray data that effectively combines different techniques in the study of DNA microarrays. Our method, <strong><em>KAS</em></strong> (<em>kernel alignment with semidefinite embedding</em>), aids the visualization of microarray data in two dimensions and shows improvement over existing dimensionality reduction methods such as PCA, LLE and Isomap.
|
Page generated in 0.0461 seconds