Global ETD Search

1	Density Based Data Clustering Albarakati, Rayan 01 March 2015 (has links) Data clustering is a data analysis technique that groups data based on a measure of similarity. When data is well clustered the similarities between the objects in the same group are high, while the similarities between objects in different groups are low. The data clustering technique is widely applied in a variety of areas such as bioinformatics, image segmentation and market research. This project conducted an in-depth study on data clustering with focus on density-based clustering methods. The latest density-based (CFSFDP) algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively larger distance from points with higher densities. This method has been examined, experimented, and improved. These methods (KNN-based, Gaussian Kernel-based and Iterative Gaussian Kernel-based) are applied in this project to improve (CFSFDP) density-based clustering. The methods are applied to four milestone datasets and the results are analyzed and compared. Clustering analysis density-based CFSFDP Iterative Gaussian Kernel-based Other Computer Engineering
2	Statistical gas distribution modelling for mobile robot applications Reggente, Matteo January 2014 (has links) In this dissertation, we present and evaluate algorithms for statistical gas distribution modelling in mobile robot applications. We derive a representation of the gas distribution in natural environments using gas measurements collected with mobile robots. The algorithms fuse different sensors readings (gas, wind and location) to create 2D or 3D maps. Throughout this thesis, the Kernel DM+V algorithm plays a central role in modelling the gas distribution. The key idea is the spatial extrapolation of the gas measurement using a Gaussian kernel. The algorithm produces four maps: the weight map shows the density of the measurements; the confidence map shows areas in which the model is considered being trustful; the mean map represents the modelled gas distribution; the variance map represents the spatial structure of the variance of the mean estimate. The Kernel DM+V/W algorithm incorporates wind measurements in the computation of the models by modifying the shape of the Gaussian kernel according to the local wind direction and magnitude. The Kernel 3D-DM+V/W algorithm extends the previous algorithm to the third dimension using a tri-variate Gaussian kernel. Ground-truth evaluation is a critical issue for gas distribution modelling with mobile platforms. We propose two methods to evaluate gas distribution models. Firstly, we create a ground-truth gas distribution using a simulation environment, and we compare the models with this ground-truth gas distribution. Secondly, considering that a good model should explain the measurements and accurately predicts new ones, we evaluate the models according to their ability in inferring unseen gas concentrations. We evaluate the algorithms carrying out experiments in different environments. We start with a simulated environment and we end in urban applications, in which we integrated gas sensors on robots designed for urban hygiene. We found that typically the models that comprise wind information outperform the models that do not include the wind data.
3	Machine Learning for incomplete data / Machine Learning for incomplete data Mesquita, Diego Parente Paiva January 2017 (has links) MESQUITA, Diego Parente Paiva. Machine Learning for incomplete data. 2017. 55 f. Dissertação (Mestrado em Ciência da Computação)-Universidade Federal do Ceará, Fortaleza, 2017. / Submitted by Jonatas Martins (jonatasmartins@lia.ufc.br) on 2017-08-29T14:42:43Z No. of bitstreams: 1 2017_dis_dppmesquita.pdf: 673221 bytes, checksum: eec550f75e2965d1120185327465a595 (MD5) / Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2017-08-29T16:04:36Z (GMT) No. of bitstreams: 1 2017_dis_dppmesquita.pdf: 673221 bytes, checksum: eec550f75e2965d1120185327465a595 (MD5) / Made available in DSpace on 2017-08-29T16:04:36Z (GMT). No. of bitstreams: 1 2017_dis_dppmesquita.pdf: 673221 bytes, checksum: eec550f75e2965d1120185327465a595 (MD5) Previous issue date: 2017 / Methods based on basis functions (such as the sigmoid and q-Gaussian functions) and similarity measures (such as distances or kernel functions) are widely used in machine learning and related fields. These methods often take for granted that data is fully observed and are not equipped to handle incomplete data in an organic manner. This assumption is often flawed, as incomplete data is a fact in various domains such as medical diagnosis and sensor analytics. Therefore, one might find it useful to be able to estimate the value of these functions in the presence of partially observed data. We propose methodologies to estimate the Gaussian Kernel, the Euclidean Distance, the Epanechnikov kernel and arbitrary basis functions in the presence of possibly incomplete feature vectors. To obtain such estimates, the incomplete feature vectors are treated as continuous random variables and, based on that, we take the expected value of the transforms of interest. / Métodos baseados em funções de base (como as funções sigmoid e a q-Gaussian) e medidas de similaridade (como distâncias ou funções de kernel) são comuns em Aprendizado de Máquina e áreas correlatas. Comumente, no entanto, esses métodos não são equipados para utilizar dados incompletos de maneira orgânica. Isso pode ser visto como um impedimento, uma vez que dados parcialmente observados são comuns em vários domínios, como aplicações médicas e dados provenientes de sensores. Nesta dissertação, propomos metodologias para estimar o valor do kernel Gaussiano, da distância Euclidiana, do kernel Epanechnikov e de funções de base arbitrárias na presença de vetores possivelmente parcialmente observados. Para obter tais estimativas, os vetores incompletos são tratados como variáveis aleatórias contínuas e, baseado nisso, tomamos o valor esperado da transformada de interesse. Machine Learning Missing data Gaussian kernel Euclidean distance Epanechnikov kernel Basis functions
4	Eigen-analysis of kernel operators for nonlinear dimension reduction and discrimination Liang, Zhiyu 02 June 2014 (has links) No description available. Statistics
5	Influence of multi-trait modeling, dominance, and population structure in genomic prediction of maize hybrids / Influência da modelagem multi-trait, dominância, e estruturação populacional na predição genômica em híbridos de milho Lyra, Danilo Hottis 14 November 2017 (has links) Genomic prediction of single-crosses is a promising tool in maize breeding, increasing genetics gains and reducing selection time. A strategy that can increase accuracy is applying multiple-trait genomic prediction using selection indices, which take into account the performance under optimal and stress conditions. Moreover, factors such as dominance, structural variants, and population structure can influence the accuracy of estimates of genomic breeding values (GEBV). Therefore, the objectives were to apply genomic prediction (i) including multi-trait models, (ii) incorporating dominance deviation and copy number variation effects, and (iii) controlling population structure in maize hybrids. Hence, we used two maize datasets (HELIX and USP), consisting of 452 and 906 maize single-crosses. The traits evaluated were grain yield, plant and ear height, stay green, and four selection indices. From multi-trait GBLUP and GK, using the combination of selection indices in MTGP is a viable alternative, increasing the selective accuracy. Furthermore, our results suggest that the best approach is predicting hybrids including dominance deviation, mainly for complex traits. We also observed including copy number variation effects seems to be suitable, due to the increase of prediction accuracies and reduction of model bias. On the other hand, adding four different sets of population structure as fixed covariates to GBLUP did not improve the prediction accuracy for grain yield and plant height. However, using nonmetric multidimensional scaling dimensions and fineSTRUCTURE group clustering increased reliability of the GEBV for GY and PH, respectively. / Predição genômica de híbridos simples é uma promissora ferramenta no melhoramento de milho, pois permite aumentar os ganhos genéticos por unidade de tempo, principalmente por reduzir o tempo de seleção. Uma estratégia que pode aumentar a acurácia das predições genômicas é realizar esta para múltiplos caracteres considerando os mesmos simultâneamente, ou utilizar índices de seleção, os quais captam a performance dos genótipos tanto em condições ótimas como em condições de estresse. Além disso, fatores como dominância, variantes estruturais, e estruturação populacional podem influenciar a acurácia de estimativas dos valores genéticos genômicos (VGG). Portanto, os objetivos foram aplicar predição genômica em híbridos de milho (i) incluindo modelos multi-trait, (ii) incorporando desvios de dominância e efeitos da variação no número de cópias, e (iii) controlando a estruturação populacional. Para isto, dois conjuntos de milho (HELIX e USP) foram utilizados, consistindo de 452 e 906 híbridos simples. Os caracteres avaliados foram produtividade de grãos, altura de planta e espiga, senescência, e quatro índices de seleção. A partir das análises multi-trait dos modelos GBLUP e GK, pôde-se concluir que a combinação dos índices é uma alternativa viável, aumentando a acurácia seletiva. Além disso, os resultados sugerem que o melhor método é a predição de híbridos incluindo desvios de dominância, principalmente para caracteres complexos. Observou-se também que incluir efeitos relacionados a variação no número de cópias indica ser adequado, devido ao aumento da acurácia e redução do viés nos modelos de predição genômica. Por outro lado, a acurácia de predição não aumentou quando se adicionou quatro diferentes conjuntos de estruturação como covariáveis fixas no modelo GBLUP. No entanto, usando o escalonamento multidimensional não métrico e o agrupamento do fineSTRUCTURE aumentaram a confiabilidade de estimação do VGG para produtividade de grãos e altura de plantas, respectivamente. Copy number variation Efeitos não-aditivos Gaussian kernel Kernel Gaussiano Milho tropical Non-additive effects Tropical maize Variação no número de cópia
6	Improving accuracy of genomic prediction in maize single-crosses through different kernels and reducing the marker dataset / Aprimorando a acurácia da predição genômica em híbridos de milho através de diferentes kernels e redução do subconjunto de marcadores Sousa, Massáine Bandeira e 09 August 2017 (has links) In plant breeding, genomic prediction (GP) may be an efficient tool to increase the accuracy of selecting genotypes, mainly, under multi-environments trials. This approach has the advantage to increase genetic gains of complex traits and reduce costs. However, strategies are needed to increase the accuracy and reduce the bias of genomic estimated breeding values. In this context, the objectives were: i) to compare two strategies to obtain markers subsets based on marker effect regarding their impact on the prediction accuracy of genome selection; and, ii) to compare the accuracy of four GP methods including genotype × environment interaction and two kernels (GBLUP and Gaussian). We used a rice diversity panel (RICE) and two maize datasets (HEL and USP). These were evaluated for grain yield and plant height. Overall, the prediction accuracy and relative efficiency of genomic selection were increased using markers subsets, which has the potential for build fixed arrays and reduce costs with genotyping. Furthermore, using Gaussian kernel and the including G×E effect, there is an increase in the accuracy of the genomic prediction models. / No melhoramento de plantas, a predição genômica (PG) é uma eficiente ferramenta para aumentar a eficiência seletiva de genótipos, principalmente, considerando múltiplos ambientes. Esta técnica tem como vantagem incrementar o ganho genético para características complexas e reduzir os custos. Entretanto, ainda são necessárias estratégias que aumentem a acurácia e reduzam o viés dos valores genéticos genotípicos. Nesse contexto, os objetivos foram: i) comparar duas estratégias para obtenção de subconjuntos de marcadores baseado em seus efeitos em relação ao seu impacto na acurácia da seleção genômica; ii) comparar a acurácia seletiva de quatro modelos de PG incluindo o efeito de interação genótipo × ambiente (G×A) e dois kernels (GBLUP e Gaussiano). Para isso, foram usados dados de um painel de diversidade de arroz (RICE) e dois conjuntos de dados de milho (HEL e USP). Estes foram avaliados para produtividade de grãos e altura de plantas. Em geral, houve incremento da acurácia de predição e na eficiência da seleção genômica usando subconjuntos de marcadores. Estes poderiam ser utilizados para construção de arrays e, consequentemente, reduzir os custos com genotipagem. Além disso, utilizando o kernel Gaussiano e incluindo o efeito de interação G×A há aumento na acurácia dos modelos de predição genômica. Gaussian kernel GBLUP GBLUP Genomic selection Genotype × environment interaction Interação genótipo x ambiente Kernel Gaussiano Seleção genômica
7	Improving accuracy of genomic prediction in maize single-crosses through different kernels and reducing the marker dataset / Aprimorando a acurácia da predição genômica em híbridos de milho através de diferentes kernels e redução do subconjunto de marcadores Massáine Bandeira e Sousa 09 August 2017 (has links) In plant breeding, genomic prediction (GP) may be an efficient tool to increase the accuracy of selecting genotypes, mainly, under multi-environments trials. This approach has the advantage to increase genetic gains of complex traits and reduce costs. However, strategies are needed to increase the accuracy and reduce the bias of genomic estimated breeding values. In this context, the objectives were: i) to compare two strategies to obtain markers subsets based on marker effect regarding their impact on the prediction accuracy of genome selection; and, ii) to compare the accuracy of four GP methods including genotype × environment interaction and two kernels (GBLUP and Gaussian). We used a rice diversity panel (RICE) and two maize datasets (HEL and USP). These were evaluated for grain yield and plant height. Overall, the prediction accuracy and relative efficiency of genomic selection were increased using markers subsets, which has the potential for build fixed arrays and reduce costs with genotyping. Furthermore, using Gaussian kernel and the including G×E effect, there is an increase in the accuracy of the genomic prediction models. / No melhoramento de plantas, a predição genômica (PG) é uma eficiente ferramenta para aumentar a eficiência seletiva de genótipos, principalmente, considerando múltiplos ambientes. Esta técnica tem como vantagem incrementar o ganho genético para características complexas e reduzir os custos. Entretanto, ainda são necessárias estratégias que aumentem a acurácia e reduzam o viés dos valores genéticos genotípicos. Nesse contexto, os objetivos foram: i) comparar duas estratégias para obtenção de subconjuntos de marcadores baseado em seus efeitos em relação ao seu impacto na acurácia da seleção genômica; ii) comparar a acurácia seletiva de quatro modelos de PG incluindo o efeito de interação genótipo × ambiente (G×A) e dois kernels (GBLUP e Gaussiano). Para isso, foram usados dados de um painel de diversidade de arroz (RICE) e dois conjuntos de dados de milho (HEL e USP). Estes foram avaliados para produtividade de grãos e altura de plantas. Em geral, houve incremento da acurácia de predição e na eficiência da seleção genômica usando subconjuntos de marcadores. Estes poderiam ser utilizados para construção de arrays e, consequentemente, reduzir os custos com genotipagem. Além disso, utilizando o kernel Gaussiano e incluindo o efeito de interação G×A há aumento na acurácia dos modelos de predição genômica. GBLUP Interação genótipo x ambiente Kernel Gaussiano Seleção genômica Gaussian kernel GBLUP Genomic selection Genotype × environment interaction
8	Contributions à l'étude de la classification spectrale et applications / Contributions to the study of spectral clustering and applications Mouysset, Sandrine 07 December 2010 (has links) La classification spectrale consiste à créer, à partir des éléments spectraux d'une matrice d'affinité gaussienne, un espace de dimension réduite dans lequel les données sont regroupées en classes. Cette méthode non supervisée est principalement basée sur la mesure d'affinité gaussienne, son paramètre et ses éléments spectraux. Cependant, les questions sur la séparabilité des classes dans l'espace de projection spectral et sur le choix du paramètre restent ouvertes. Dans un premier temps, le rôle du paramètre de l'affinité gaussienne sera étudié à travers des mesures de qualités et deux heuristiques pour le choix de ce paramètre seront proposées puis testées. Ensuite, le fonctionnement même de la méthode est étudié à travers les éléments spectraux de la matrice d'affinité gaussienne. En interprétant cette matrice comme la discrétisation du noyau de la chaleur définie sur l'espace entier et en utilisant les éléments finis, les vecteurs propres de la matrice affinité sont la représentation asymptotique de fonctions dont le support est inclus dans une seule composante connexe. Ces résultats permettent de définir des propriétés de classification et des conditions sur le paramètre gaussien. A partir de ces éléments théoriques, deux stratégies de parallélisation par décomposition en sous-domaines sont formulées et testées sur des exemples géométriques et de traitement d'images. Enfin dans le cadre non supervisé, le classification spectrale est appliquée, d'une part, dans le domaine de la génomique pour déterminer différents profils d'expression de gènes d'une légumineuse et, d'autre part dans le domaine de l'imagerie fonctionnelle TEP, pour segmenter des régions du cerveau présentant les mêmes courbes d'activités temporelles. / The Spectral Clustering consists in creating, from the spectral elements of a Gaussian affinity matrix, a low-dimension space in which data are grouped into clusters. This unsupervised method is mainly based on Gaussian affinity measure, its parameter and its spectral elements. However, questions about the separability of clusters in the projection space and the spectral parameter choices remain open. First, the rule of the parameter of Gaussian affinity will be investigated through quality measures and two heuristics for choosing this setting will be proposed and tested. Then, the method is studied through the spectral element of the Gaussian affinity matrix. By interpreting this matrix as the discretization of the heat kernel defined on the whole space and using finite elements, the eigenvectors of the affinity matrix are asymptotic representation of functions whose support is included in one connected component. These results help define the properties of clustering and conditions on the Gaussian parameter. From these theoretical elements, two parallelization strategies by decomposition into sub-domains are formulated and tested on geometrical examples and images. Finally, as unsupervised applications, the spectral clustering is applied, first in the field of genomics to identify different gene expression profiles of a legume and the other in the imaging field functional PET, to segment the brain regions with similar time-activity curves. Classification non supervisée Classification spectrale Noyau gaussien Equation de la chaleur Éléments finis Parallélisation Imagerie médicale Clustering Spectral clustering Gaussian kernel Heat equation Finite elements Parallelization Medical imaging
9	Kernel LMS à noyau gaussien : conception, analyse et applications à divers contextes / Gaussian kernel least-mean-square : design, analysis and applications Gao, Wei 09 December 2015 (has links) L’objectif principal de cette thèse est de décliner et d’analyser l’algorithme kernel-LMS à noyau Gaussien dans trois cadres différents: celui des noyaux uniques et multiples, à valeurs réelles et à valeurs complexes, dans un contexte d’apprentissage distributé et coopératif dans les réseaux de capteurs. Plus précisement, ce travail s’intéresse à l’analyse du comportement en moyenne et en erreur quadratique de cas différents types d’algorithmes LMS à noyau. Les modèles analytiques de convergence obtenus sont validés par des simulations numérique. Tout d’abord, nous introduisons l’algorithme LMS, les espaces de Hilbert à noyau reproduisants, ainsi que les algorithmes de filtrage adaptatif à noyau existants. Puis, nous étudions analytiquement le comportement de l’algorithme LMS à noyau Gaussien dans le cas où les statistiques des éléments du dictionnaire ne répondent que partiellement aux statistiques des données d’entrée. Nous introduisons ensuite un algorithme LMS modifié à noyau basé sur une approche proximale. La stabilité de l’algorithme est également discutée. Ensuite, nous introduisons deux types d’algorithmes LMS à noyaux multiples. Nous nous concentrons en particulier sur l’analyse de convergence de l’un d’eux. Plus généralement, les caractéristiques des deux algorithmes LMS à noyaux multiples sont analysées théoriquement et confirmées par les simulations. L’algorithme LMS à noyau complexe augmenté est présenté et ses performances analysées. Enfin, nous proposons des stratégies de diffusion fonctionnelles dans les espaces de Hilbert à noyau reproduisant. La stabilité́ de cas de l’algorithme est étudiée. / The main objective of this thesis is to derive and analyze the Gaussian kernel least-mean-square (LMS) algorithm within three frameworks involving single and multiple kernels, real-valued and complex-valued, non-cooperative and cooperative distributed learning over networks. This work focuses on the stochastic behavior analysis of these kernel LMS algorithms in the mean and mean-square error sense. All the analyses are validated by numerical simulations. First, we review the basic LMS algorithm, reproducing kernel Hilbert space (RKHS), framework and state-of-the-art kernel adaptive filtering algorithms. Then, we study the convergence behavior of the Gaussian kernel LMS in the case where the statistics of the elements of the so-called dictionary only partially match the statistics of the input data. We introduced a modified kernel LMS algorithm based on forward-backward splitting to deal with $\ell_1$-norm regularization. The stability of the proposed algorithm is then discussed. After a review of two families of multikernel LMS algorithms, we focus on the convergence behavior of the multiple-input multikernel LMS algorithm. More generally, the characteristics of multikernel LMS algorithms are analyzed theoretically and confirmed by simulation results. Next, the augmented complex kernel LMS algorithm is introduced based on the framework of complex multikernel adaptive filtering. Then, we analyze the convergence behavior of algorithm in the mean-square error sense. Finally, in order to cope with the distributed estimation problems over networks, we derive functional diffusion strategies in RKHS. The stability of the algorithm in the mean sense is analyzed. Analyse de convergence Kernel least-mean-square Noyau gaussien Noyaux multiples Noyaux complexes Algorithmes de diffusion distribués Convergence analysis Kernel least-mean-square Gaussian kernel Multikernel Complex kernel Diffusion adaptation in RKHS
10	A Data Requisition Treatment Instrument For Clinical Quantifiable Soft Tissue Manipulation Bhattacharjee, Abhinaba 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Soft tissue manipulation is a widely used practice by manual therapists from a variety of healthcare disciplines to evaluate and treat neuromusculoskeletal impairments using mechanical stimulation either by hand massage or specially-designed tools. The practice of a specific approach of targeted pressure application using distinguished rigid mechanical tools to breakdown adhesions, scar tissues and improve range of motion for affected joints is called Instrument-Assisted Soft Tissue Manipulation (IASTM). The efficacy of IASTM has been demonstrated as a means to improve mobility of joints, reduce pain, enhance flexibility and restore function. However, unlike the techniques of ultrasound, traction, electrical stimulation, etc. the practice of IASTM doesn't involve any standard to objectively characterize massage with physical parameters. Thus, most IASTM treatments are subjective to practitioner or patient subjective feedback, which essentially addresses a need to quantify therapeutic massage or IASTM treatment with adequate treatment parameters to document, better analyze, compare and validate STM treatment as an established, state-of-the-art practice. This thesis focuses on the development and implementation of Quantifiable Soft Tissue Manipulation (QSTM™) Technology by designing an ergonomic, portable and miniaturized wired localized pressure applicator medical device (Q1), for characterizing soft tissue manipulation. Dose-load response in terms of forces in Newtons; pitch angle of the device ; stroke frequency of massage measured within stipulated time of treatment; all in real-time has been captured to characterize a QSTM session. A QSTM PC software (Q-WARE©) featuring a Treatment Record System subjective to individual patients to save and retrieve treatment diagnostics and a real-time graphical visual monitoring system has been developed from scratch on WINDOWS platform to successfully implement the technology. This quantitative analysis of STM treatment without visual monitoring has demonstrated inter-reliability and intra-reliability inconsistencies by clinicians in STM force application. While improved consistency of treatment application has been found when using visual monitoring from the QSTM feedback system. This system has also discriminated variabilities in application of high, medium and low dose-loads and stroke frequency analysis during targeted treatment sessions. / 2023-04-26 IASTM QSTM Q-Ware Stroke Frequency Dose-Load Force Quantification Treatment Mode Dicrete Gaussian Kernel Active Time Dead Time 3D Load Cell IMU Sensor Device Pause State Treatment Session Treatment Sub-Session Real-Time Computation Post Processing Computation Geo-Angles

Search results