211 |
Dimensionality Reduction for Commercial Vehicle Fleet MonitoringBaldiwala, Aliakbar 25 October 2018 (has links)
A variety of new features have been added in the present-day vehicles like a pre-crash warning, the vehicle to vehicle communication, semi-autonomous driving systems, telematics, drive by wire. They demand very high bandwidth from in-vehicle networks. Various electronic control units present inside the automotive transmit useful information via automotive multiplexing. Automotive multiplexing allows sharing information among various intelligent modules inside an automotive electronic system. Optimum functionality is achieved by transmitting this data in real time. The high bandwidth and high-speed requirement can be achieved either by using multiple buses or by implementing higher bandwidth. But, by doing so the cost of the network and the complexity of the wiring in the vehicle increases.
Another option is to implement higher layer protocol which can reduce the amount of data transferred by using data reduction (DR) techniques, thus reducing the bandwidth usage. The implementation cost is minimal as only the changes are required in the software and not in hardware. In our work, we present a new data reduction algorithm termed as “Comprehensive Data Reduction (CDR)” algorithm. The proposed algorithm is used for minimization of the bus utilization of CAN bus for a future vehicle. The reduction in the busload was efficiently made by compressing the parameters; thus, more number of messages and lower priority messages can be efficiently sent on the CAN bus.
The proposed work also presents a performance analysis of proposed algorithm with the boundary of fifteen compression algorithm, and Compression area selection algorithms (Existing Data Reduction Algorithm). The results of the analysis show that proposed CDR algorithm provides better data reduction compared to earlier proposed algorithms. The promising results were obtained in terms of reduction in bus utilization, compression efficiency, and percent peak load of CAN bus. This Reduction in the bus utilization permits to utilize a larger number of network nodes (ECU’s) in the existing system without increasing the overall cost of the system. The proposed algorithm has been developed for automotive environment, but it can also be utilized in any applications where extensive information transmission among various control units is carried out via a multiplexing bus.
212 |
Representações textuais e a geração de hubs : um estudo comparativoAguiar, Raul Freire January 2017 (has links)
Orientador: Prof. Dr. Ronaldo Pratti / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2017. / O efeito de hubness, juntamente com a maldição de dimensionalidade, vem sendo estudado, sob diferentes oticas, nos ultimos anos. Os estudos apontam que este problema esta presente em varios conjuntos de dados do mundo real e que a presença de hubs (tendencia de alguns exemplos aparecem com frequencia na lista de vizinhos mais proximos de outros exemplos) traz uma serie de consequencias indesejaveis, como por exemplo, afetar o desempenho de classificadores. Em tarefas de mineração de texto, o problema depende tambem da maneira escolhida pra representar os documentos. Sendo assim o objetivo principal dessa dissertação é avaliar o impacto da formação de hubs em diferentes representações textuais. Ate onde vai o nosso conhecimento e durante o período desta pesquisa,
não foi posivel encontrar na literatura um estudo aprofundado sobre as implicaçõess do efeito de hubness em diferentes representações textuais. Os resultados sugerem que as diferentes representações textuais implicam em corpus com propensão menor para a formação de hubs. Notou-se também que a incidencia de hubs nas diferentes representações textuais possuem in
uencia similar em alguns classificadores. Analisamos tambem o desempenho dos classifcadores apos a remoção de documentos sinalizados como hubs em porçõess pre-estabelecidas do tamanho total do data set. Essa remoção trouxe, a alguns algoritmos, uma tendencia de melhoria de desempenho. Dessa maneira, apesar de nem sempre efetiva, a estrategia de identifcar e remover hubs com uma vizinhança majoritariamente
ruim pode ser uma interessante tecnica de pre-processamento a ser considerada, com o intuito de melhorar o desempenho preditivo da tarefa de classificação. / The hubness phenomenon, associated to the curse of dimensionality, has been studied, from diferent perspectives, in recent years. These studies point out that the hubness problem is present in several real-world data sets and, as a consequence, the hubness implies a series of undesirable side efects, such as an increase in misclassifcation error in classification tasks. In text mining research, this problem also depends on the choice of text representation. Hence, the main objective of the dissertation is to evaluate the impact of the hubs presence in diferent textual representations. To the best of our knowledge, this is the first study that performs an in-depth analysis on the efects of the hub problem in diferent textual representations. The results suggest that diferent text representations
implies in diferent bias towards hubs presence in diferent corpus. It was also noticed that the presence of hubs in dierent text representations has similar in
uence for some classifiers. We also analyzed the performance of classifiers after removing documents
agged as hubs in pre-established portions of the total data set size. This removal allows, to some algorithms, a trend of improvement in performance. Thus, although not always efective, the strategy of identifying and removing hubs with a majority of bad neighborhood may be an interesting preprocessing technique to be considered in order to improve the predictive performance of the text classification task.
213 |
Classificação de dados imagens em alta dimensionalidade, empregando amostras semi-rotuladas e estimadores para as probabilidades a priori / Classification of high dimensionality image data, using semilabeled samples and estimation of the a priori probabilitiesLiczbinski, Celso Antonio January 2007 (has links)
Em cenas naturais, ocorrem com certa freqüência classes espectralmente muito similares, isto é, os vetores média são muito próximos. Em situações como esta dados de baixa dimensionalidade (LandSat-TM, Spot) não permitem uma classificação acurada da cena. Por outro lado, sabe-se que dados em alta dimensionalidade tornam possível a separação destas classes, desde que as matrizes covariância sejam suficientemente distintas. Neste caso, o problema de natureza prática que surge é o da estimação dos parâmetros que caracterizam a distribuição de cada classe. Na medida em que a dimensionalidade dos dados cresce, aumenta o número de parâmetros a serem estimados, especialmente na matriz covariância. Contudo, é sabido que, no mundo real, a quantidade de amostras de treinamento disponíveis, é freqüentemente muito limitada, ocasionando problemas na estimação dos parâmetros necessários ao classificador, degradando, portanto a acurácia do processo de classificação, na medida em que a dimensionalidade dos dados aumenta. O Efeito de Hughes, como é chamado este fenômeno, já é bem conhecido no meio científico, e estudos vêm sendo realizados com o objetivo de mitigar este efeito. Entre as alternativas propostas com a finalidade de mitigar o Efeito de Hughes, encontram-se as técnicas que utilizam amostras não rotuladas e amostras semi-rotuladas para minimizar o problema do tamanho reduzido das amostras de treinamento. Deste modo, técnicas que utilizam amostras semi-rotuladas, tornamse um tópico interessante de estudo, bem como o comportamento destas técnicas em ambientes de dados de imagens digitais de alta dimensionalidade em sensoriamento remoto, como por exemplo, os dados fornecidos pelo sensor AVIRIS. Neste estudo foi dado prosseguimento à metodologia investigada por Lemos (2003), o qual implementou a utilização de amostras semi-rotuladas para fins de estimação dos parâmetros do classificador Máxima Verossimilhança Gaussiana (MVG). A contribuição do presente trabalho consistiu na inclusão de uma etapa adicional, introduzindo a estimação das probabilidades a priori P( wi) referentes às classes envolvidas para utilização no classificador MVG. Desta forma, utilizando-se funções de decisão mais ajustadas à realidade da cena analisada, obteve-se resultados mais acurados no processo de classificação. Os resultados atestaram que com um número limitado de amostras de treinamento, técnicas que utilizam algoritmos adaptativos, mostram-se eficientes em reduzir o Efeito de Hughes. Apesar deste Efeito, quanto à acurácia, em todos os casos o modelo quadrático mostrou-se eficiente através do algoritmo adaptativo. A conclusão principal desta dissertação é que o método do algoritmo adaptativo é útil no processo de classificação de imagens com dados em alta dimensionalidade e classes com características espectrais muito próximas. / In natural scenes there are some cases in which some of the land-cover classes involved are spectrally very similar, i.e., their first order statistics are nearly identical. In these cases, the more traditional sensor systems such as Landsat-TM and Spot, among others usually result in a thematic image low in accuracy. On the other hand, it is well known that high-dimensional image data allows for the separation of classes that are spectrally very similar, provided that their second-order statistics differ significantly. The classification of high-dimensional image data, however, poses some new problems such as the estimation of the parameters in a parametric classifier. As the data dimensionality increases, so does the number of parameters to be estimated, particularly in the covariance matrix. In real cases, however, the number of training samples available is usually limited preventing therefore a reliable estimation of the parameters required by the classifier. The paucity of training samples results in a low accuracy for the thematic image which becomes more noticeable as the data dimensionality increases. This condition is known as the Hughes Phenomenon. Different approaches to mitigate the Hughes Phenomenon investigated by many authors have been reported in the literature. Among the possible alternatives that have been proposed, the so called semi-labeled samples has shown some promising results in the classification of remote sensing high dimensional image data, such as AVIRIS data. In this dissertation the approach proposed by Lemos (2003) is further investigated to increase the reliability in the estimation of the parameters required by the Gaussian Maximum Likelihood (GML) classifier. In this dissertation, we propose a methodology to estimate the a priory probabilities P( i) required by the GMV classifier. It is expected that a more realistic estimation of the values for the a priory probabilities well help to increase the accuracy of the thematic image produced by the GML classifier. The experiments performed in this study have shown an increase in the accuracy of the thematic image, suggesting the adequacy of the proposed methodology.
214 |
Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement AnalysisJanuary 2016 (has links)
abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016
215 |
Distinct Feature Learning and Nonlinear Variation Pattern Discovery Using Regularized AutoencodersJanuary 2016 (has links)
abstract: Feature learning and the discovery of nonlinear variation patterns in high-dimensional data is an important task in many problem domains, such as imaging, streaming data from sensors, and manufacturing. This dissertation presents several methods for learning and visualizing nonlinear variation in high-dimensional data. First, an automated method for discovering nonlinear variation patterns using deep learning autoencoders is proposed. The approach provides a functional mapping from a low-dimensional representation to the original spatially-dense data that is both interpretable and efficient with respect to preserving information. Experimental results indicate that deep learning autoencoders outperform manifold learning and principal component analysis in reproducing the original data from the learned variation sources.
A key issue in using autoencoders for nonlinear variation pattern discovery is to encourage the learning of solutions where each feature represents a unique variation source, which we define as distinct features. This problem of learning distinct features is also referred to as disentangling factors of variation in the representation learning literature. The remainder of this dissertation highlights and provides solutions for this important problem.
An alternating autoencoder training method is presented and a new measure motivated by orthogonal loadings in linear models is proposed to quantify feature distinctness in the nonlinear models. Simulated point cloud data and handwritten digit images illustrate that standard training methods for autoencoders consistently mix the true variation sources in the learned low-dimensional representation, whereas the alternating method produces solutions with more distinct patterns.
Finally, a new regularization method for learning distinct nonlinear features using autoencoders is proposed. Motivated in-part by the properties of linear solutions, a series of learning constraints are implemented via regularization penalties during stochastic gradient descent training. These include the orthogonality of tangent vectors to the manifold, the correlation between learned features, and the distributions of the learned features. This regularized learning approach yields low-dimensional representations which can be better interpreted and used to identify the true sources of variation impacting a high-dimensional feature space. Experimental results demonstrate the effectiveness of this method for nonlinear variation pattern discovery on both simulated and real data sets. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2016
216 |
3D - Patch Based Machine Learning Systems for Alzheimer’s Disease classification via 18F-FDG PET AnalysisJanuary 2017 (has links)
abstract: Alzheimer’s disease (AD), is a chronic neurodegenerative disease that usually starts slowly and gets worse over time. It is the cause of 60% to 70% of cases of dementia. There is growing interest in identifying brain image biomarkers that help evaluate AD risk pre-symptomatically. High-dimensional non-linear pattern classification methods have been applied to structural magnetic resonance images (MRI’s) and used to discriminate between clinical groups in Alzheimers progression. Using Fluorodeoxyglucose (FDG) positron emission tomography (PET) as the pre- ferred imaging modality, this thesis develops two independent machine learning based patch analysis methods and uses them to perform six binary classification experiments across different (AD) diagnostic categories. Specifically, features were extracted and learned using dimensionality reduction and dictionary learning & sparse coding by taking overlapping patches in and around the cerebral cortex and using them as fea- tures. Using AdaBoost as the preferred choice of classifier both methods try to utilize 18F-FDG PET as a biological marker in the early diagnosis of Alzheimer’s . Addi- tional we investigate the involvement of rich demographic features (ApoeE3, ApoeE4 and Functional Activities Questionnaires (FAQ)) in classification. The experimental results on Alzheimer’s Disease Neuroimaging initiative (ADNI) dataset demonstrate the effectiveness of both the proposed systems. The use of 18F-FDG PET may offer a new sensitive biomarker and enrich the brain imaging analysis toolset for studying the diagnosis and prognosis of AD. / Dissertation/Thesis / Thesis Defense Presentation / Masters Thesis Computer Science 2017
217 |
Assessing Measurement Invariance and Latent Mean Differences with Bifactor Multidimensional Data in Structural Equation ModelingJanuary 2018 (has links)
abstract: Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline models, the bifactor models fit data well and had minimal bias in latent mean estimation. However, the low convergence rates of fitting bifactor models to data with complex structures and small sample sizes caused concern. On the other hand, effects of fitting the misspecified single-factor models on the assessments of MI and latent means differed by the bifactor structures underlying data. For data following one general factor and one group factor affecting a small set of indicators, the effects of ignoring the group factor in analysis models on the tests of MI and latent mean differences were mild. In contrast, for data following one general factor and several group factors, oversimplifications of analysis models can lead to inaccurate conclusions regarding MI assessment and latent mean estimation. / Dissertation/Thesis / Doctoral Dissertation Educational Psychology 2018
218 |
Classificação de dados imagens em alta dimensionalidade, empregando amostras semi-rotuladas e estimadores para as probabilidades a priori / Classification of high dimensionality image data, using semilabeled samples and estimation of the a priori probabilitiesLiczbinski, Celso Antonio January 2007 (has links)
Em cenas naturais, ocorrem com certa freqüência classes espectralmente muito similares, isto é, os vetores média são muito próximos. Em situações como esta dados de baixa dimensionalidade (LandSat-TM, Spot) não permitem uma classificação acurada da cena. Por outro lado, sabe-se que dados em alta dimensionalidade tornam possível a separação destas classes, desde que as matrizes covariância sejam suficientemente distintas. Neste caso, o problema de natureza prática que surge é o da estimação dos parâmetros que caracterizam a distribuição de cada classe. Na medida em que a dimensionalidade dos dados cresce, aumenta o número de parâmetros a serem estimados, especialmente na matriz covariância. Contudo, é sabido que, no mundo real, a quantidade de amostras de treinamento disponíveis, é freqüentemente muito limitada, ocasionando problemas na estimação dos parâmetros necessários ao classificador, degradando, portanto a acurácia do processo de classificação, na medida em que a dimensionalidade dos dados aumenta. O Efeito de Hughes, como é chamado este fenômeno, já é bem conhecido no meio científico, e estudos vêm sendo realizados com o objetivo de mitigar este efeito. Entre as alternativas propostas com a finalidade de mitigar o Efeito de Hughes, encontram-se as técnicas que utilizam amostras não rotuladas e amostras semi-rotuladas para minimizar o problema do tamanho reduzido das amostras de treinamento. Deste modo, técnicas que utilizam amostras semi-rotuladas, tornamse um tópico interessante de estudo, bem como o comportamento destas técnicas em ambientes de dados de imagens digitais de alta dimensionalidade em sensoriamento remoto, como por exemplo, os dados fornecidos pelo sensor AVIRIS. Neste estudo foi dado prosseguimento à metodologia investigada por Lemos (2003), o qual implementou a utilização de amostras semi-rotuladas para fins de estimação dos parâmetros do classificador Máxima Verossimilhança Gaussiana (MVG). A contribuição do presente trabalho consistiu na inclusão de uma etapa adicional, introduzindo a estimação das probabilidades a priori P( wi) referentes às classes envolvidas para utilização no classificador MVG. Desta forma, utilizando-se funções de decisão mais ajustadas à realidade da cena analisada, obteve-se resultados mais acurados no processo de classificação. Os resultados atestaram que com um número limitado de amostras de treinamento, técnicas que utilizam algoritmos adaptativos, mostram-se eficientes em reduzir o Efeito de Hughes. Apesar deste Efeito, quanto à acurácia, em todos os casos o modelo quadrático mostrou-se eficiente através do algoritmo adaptativo. A conclusão principal desta dissertação é que o método do algoritmo adaptativo é útil no processo de classificação de imagens com dados em alta dimensionalidade e classes com características espectrais muito próximas. / In natural scenes there are some cases in which some of the land-cover classes involved are spectrally very similar, i.e., their first order statistics are nearly identical. In these cases, the more traditional sensor systems such as Landsat-TM and Spot, among others usually result in a thematic image low in accuracy. On the other hand, it is well known that high-dimensional image data allows for the separation of classes that are spectrally very similar, provided that their second-order statistics differ significantly. The classification of high-dimensional image data, however, poses some new problems such as the estimation of the parameters in a parametric classifier. As the data dimensionality increases, so does the number of parameters to be estimated, particularly in the covariance matrix. In real cases, however, the number of training samples available is usually limited preventing therefore a reliable estimation of the parameters required by the classifier. The paucity of training samples results in a low accuracy for the thematic image which becomes more noticeable as the data dimensionality increases. This condition is known as the Hughes Phenomenon. Different approaches to mitigate the Hughes Phenomenon investigated by many authors have been reported in the literature. Among the possible alternatives that have been proposed, the so called semi-labeled samples has shown some promising results in the classification of remote sensing high dimensional image data, such as AVIRIS data. In this dissertation the approach proposed by Lemos (2003) is further investigated to increase the reliability in the estimation of the parameters required by the Gaussian Maximum Likelihood (GML) classifier. In this dissertation, we propose a methodology to estimate the a priory probabilities P( i) required by the GMV classifier. It is expected that a more realistic estimation of the values for the a priory probabilities well help to increase the accuracy of the thematic image produced by the GML classifier. The experiments performed in this study have shown an increase in the accuracy of the thematic image, suggesting the adequacy of the proposed methodology.
219 |
Investigating Gene-Gene and Gene-Environment Interactions in the Association Between Overnutrition and Obesity-Related PhenotypesTessier, François January 2017 (has links)
Introduction – Animal studies suggested that NFKB1, SOCS3 and IKBKB genes could be involved in the association between overnutrition and obesity. This study aims to investigate interactions involving these genes and nutrition affecting obesity-related phenotypes.
Methods – We used multifactor dimensionality reduction (MDR) and penalized logistic regression (PLR) to better detect gene/environment interactions in data from the Toronto Nutrigenomics and Health Study (n=1639) using dichotomized body mass index (BMI) and waist circumference (WC) as obesity-related phenotypes. Exposure variables included genotypes on 54 single nucleotide polymorphisms, dietary factors and ethnicity.
Results – MDR identified interactions between SOCS3 rs6501199 and rs4969172, and IKBKB rs3747811 affecting BMI in whites; SOCS3 rs6501199 and NFKB1 rs1609798 affecting WC in whites; and SOCS3 rs4436839 and IKBKB rs3747811 affecting WC in South Asians. PLR found a main effect of SOCS3 rs12944581 on BMI among South Asians.
Conclusion – MDR and PLR gave different results, but support some results from previous studies.
220 |
Contributions à l'inférence statistique dans les modèles de régression partiellement linéaires additifs / Contributions to the statistical inference in partially linear additive regression modelChokri, Khalid 21 November 2014 (has links)
Les modèles de régression paramétrique fournissent de puissants outils pour la modélisation des données lorsque celles-ci s’y prêtent bien. Cependant, ces modèles peuvent être la source d’importants biais lorsqu’ils ne sont pas adéquats. Pour éliminer ces biais de modélisation, des méthodes non paramétriques ont été introduites permettant aux données elles mêmes de construire le modèle. Ces méthodes présentent, dans le cas multivarié, un handicap connu sous l’appellation de fléau de la dimension où la vitesse de convergence des estimateurs est une fonction décroissante de la dimension des covariables. L’idée est alors de combiner une partie linéaire avec une partie non-linéaire, ce qui aurait comme effet de réduire l’impact du fléau de la dimension. Néanmoins l’estimation non-paramétrique de la partie non-linéaire, lorsque celle-ci est multivariée, est soumise à la même contrainte de détérioration de sa vitesse de convergence. Pour pallier ce problème, la réponse adéquate est l’introduction d’une structure additive de la partie non-linéaire de son estimation par des méthodes appropriées. Cela permet alors de définir des modèles de régression partièllement linéaires et additifs. L’objet de la thèse est d’établir des résultats asymptotiques relatifs aux divers paramètres de ce modèle (consistance, vitesses de convergence, normalité asymptotique et loi du logarithme itéré) et de construire aussi des tests d’hypothèses relatives à la structure du modèle, comme l’additivité de la partie non-linéaire, et à ses paramètres. / Parametric regression models provide powerful tools for analyzing practical data when the models are correctly specified, but may suffer from large modelling biases when structures of the models are misspecified. As an alternative, nonparametric smoothing methods eases the concerns on modelling biases. However, nonparametric models are hampered by the so-called curse of dimensionality in multivariate settings. One of the methods for attenuating this difficulty is to model covariate effects via a partially linear structure, a combination of linear and nonlinear parts. To reduce the dimension impact in the estimation of the nonlinear part of the partially linear regression model, we introduce an additive structure of this part which induces, finally, a partially linear additive model. Our aim in this work is to establish some limit results pertaining to various parameters of the model (consistency, rate of convergence, asymptotic normality and iterated logarithm law) and to construct some hypotheses testing procedures related to the model structure, as the additivity of the nonlinear part, and to its parameters.
Page generated in 0.1022 seconds