Global ETD Search

1	Advancing the Effectiveness of Non-Linear Dimensionality Reduction Techniques Gashler, Michael S. 18 May 2012 (has links) (PDF) Data that is represented with high dimensionality presents a computational complexity challenge for many existing algorithms. Limiting dimensionality by discarding attributes is sometimes a poor solution to this problem because significant high-level concepts may be encoded in the data across many or all of the attributes. Non-linear dimensionality reduction (NLDR) techniques have been successful with many problems at minimizing dimensionality while preserving intrinsic high-level concepts that are encoded with varying combinations of attributes. Unfortunately, many challenges remain with existing NLDR techniques, including excessive computational requirements, an inability to benefit from prior knowledge, and an inability to handle certain difficult conditions that occur in data with many real-world problems. Further, certain practical factors have limited advancement in NLDR, such as a lack of clarity regarding suitable applications for NLDR, and a general inavailability of efficient implementations of complex algorithms. This dissertation presents a collection of papers that advance the state of NLDR in each of these areas. Contributions of this dissertation include: • An NLDR algorithm, called Manifold Sculpting, that optimizes its solution using graduated optimization. This approach enables it to obtain better results than methods that only optimize an approximate problem. Additionally, Manifold Sculpting can benefit from prior knowledge about the problem. • An intelligent neighbor-finding technique called SAFFRON that improves the breadth of problems that existing NLDR techniques can handle. • A neighborhood refinement technique called CycleCut that further increases the robustness of existing NLDR techniques, and that can work in conjunction with SAFFRON to solve difficult problems. • Demonstrations of specific applications for NLDR techniques, including the estimation of state within dynamical systems, training of recurrent neural networks, and imputing missing values in data. • An open source toolkit containing each of the techniques described in this dissertation, as well as several existing NLDR algorithms, and other useful machine learning methods. non-linear dimensionality reduction manifold learning intrinsic variables state estimation imputation neighbor selection neighborhood refinement Computer Sciences
2	MINIMIZING CONGESTION IN PEER-TO-PEER NETWORKS UNDER THE PRESENCE OF GUARDED NODES FAIRBANKS, MICHAEL STEWART 20 July 2006 (has links) No description available. Computer Science Neighbor Selection overlay construction firewall peer-to-peer performance p2p
3	Algoritmo kNN para previsão de dados temporais: funções de previsão e critérios de seleção de vizinhos próximos aplicados a variáveis ambientais em limnologia / Time series prediction using a KNN-based algorithm prediction functions and nearest neighbor selection criteria applied to limnological data Ferrero, Carlos Andres 04 March 2009 (has links) A análise de dados contendo informações sequenciais é um problema de crescente interesse devido à grande quantidade de informação que é gerada, entre outros, em processos de monitoramento. As séries temporais são um dos tipos mais comuns de dados sequenciais e consistem em observações ao longo do tempo. O algoritmo k-Nearest Neighbor - Time Series Prediction kNN-TSP é um método de previsão de dados temporais. A principal vantagem do algoritmo é a sua simplicidade, e a sua aplicabilidade na análise de séries temporais não-lineares e na previsão de comportamentos sazonais. Entretanto, ainda que ele frequentemente encontre as melhores previsões para séries temporais parcialmente periódicas, várias questões relacionadas com a determinação de seus parâmetros continuam em aberto. Este trabalho, foca-se em dois desses parâmetros, relacionados com a seleção de vizinhos mais próximos e a função de previsão. Para isso, é proposta uma abordagem simples para selecionar vizinhos mais próximos que considera a similaridade e a distância temporal de modo a selecionar os padrões mais similares e mais recentes. Também é proposta uma função de previsão que tem a propriedade de manter bom desempenho na presença de padrões em níveis diferentes da série temporal. Esses parâmetros foram avaliados empiricamente utilizando várias séries temporais, inclusive caóticas, bem como séries temporais reais referentes a variáveis ambientais do reservatório de Itaipu, disponibilizadas pela Itaipu Binacional. Três variáveis limnológicas fortemente correlacionadas são consideradas nos experimentos de previsão: temperatura da água, temperatura do ar e oxigênio dissolvido. Uma análise de correlação é realizada para verificar se os dados previstos mantem a correlação das variáveis. Os resultados mostram que, o critério de seleção de vizinhos próximos e a função de previsão, propostos neste trabalho, são promissores / Treating data that contains sequential information is an important problem that arises during the data mining process. Time series constitute a popular class of sequential data, where records are indexed by time. The k-Nearest Neighbor - Time Series Prediction kNN-TSP method is an approximator for time series prediction problems. The main advantage of this approximator is its simplicity, and is often used in nonlinear time series analysis for prediction of seasonal time series. Although kNN-TSP often finds the best fit for nearly periodic time series forecasting, some problems related to how to determine its parameters still remain. In this work, we focus in two of these parameters: the determination of the nearest neighbours and the prediction function. To this end, we propose a simple approach to select the nearest neighbours, where time is indirectly taken into account by the similarity measure, and a prediction function which is not disturbed in the presence of patterns at different levels of the time series. Both parameters were empirically evaluated on several artificial time series, including chaotic time series, as well as on a real time series related to several environmental variables from the Itaipu reservoir, made available by Itaipu Binacional. Three of the most correlated limnological variables were considered in the experiments carried out on the real time series: water temperature, air temperature and dissolved oxygen. Analyses of correlation were also accomplished to verify if the predicted variables values maintain similar correlation as the original ones. Results show that both proposals, the one related to the determination of the nearest neighbours as well as the one related to the prediction function, are promising Aprendizado de máquina Dados ambientais Environmental data Funções de previsão Limnologia Limnology Machine learning Nearest neighbor selection Prediction functions Previsão de dados temporais Seleção de vizinhos próximos Time series prediction
4	Algoritmo kNN para previsão de dados temporais: funções de previsão e critérios de seleção de vizinhos próximos aplicados a variáveis ambientais em limnologia / Time series prediction using a KNN-based algorithm prediction functions and nearest neighbor selection criteria applied to limnological data Carlos Andres Ferrero 04 March 2009 (has links) A análise de dados contendo informações sequenciais é um problema de crescente interesse devido à grande quantidade de informação que é gerada, entre outros, em processos de monitoramento. As séries temporais são um dos tipos mais comuns de dados sequenciais e consistem em observações ao longo do tempo. O algoritmo k-Nearest Neighbor - Time Series Prediction kNN-TSP é um método de previsão de dados temporais. A principal vantagem do algoritmo é a sua simplicidade, e a sua aplicabilidade na análise de séries temporais não-lineares e na previsão de comportamentos sazonais. Entretanto, ainda que ele frequentemente encontre as melhores previsões para séries temporais parcialmente periódicas, várias questões relacionadas com a determinação de seus parâmetros continuam em aberto. Este trabalho, foca-se em dois desses parâmetros, relacionados com a seleção de vizinhos mais próximos e a função de previsão. Para isso, é proposta uma abordagem simples para selecionar vizinhos mais próximos que considera a similaridade e a distância temporal de modo a selecionar os padrões mais similares e mais recentes. Também é proposta uma função de previsão que tem a propriedade de manter bom desempenho na presença de padrões em níveis diferentes da série temporal. Esses parâmetros foram avaliados empiricamente utilizando várias séries temporais, inclusive caóticas, bem como séries temporais reais referentes a variáveis ambientais do reservatório de Itaipu, disponibilizadas pela Itaipu Binacional. Três variáveis limnológicas fortemente correlacionadas são consideradas nos experimentos de previsão: temperatura da água, temperatura do ar e oxigênio dissolvido. Uma análise de correlação é realizada para verificar se os dados previstos mantem a correlação das variáveis. Os resultados mostram que, o critério de seleção de vizinhos próximos e a função de previsão, propostos neste trabalho, são promissores / Treating data that contains sequential information is an important problem that arises during the data mining process. Time series constitute a popular class of sequential data, where records are indexed by time. The k-Nearest Neighbor - Time Series Prediction kNN-TSP method is an approximator for time series prediction problems. The main advantage of this approximator is its simplicity, and is often used in nonlinear time series analysis for prediction of seasonal time series. Although kNN-TSP often finds the best fit for nearly periodic time series forecasting, some problems related to how to determine its parameters still remain. In this work, we focus in two of these parameters: the determination of the nearest neighbours and the prediction function. To this end, we propose a simple approach to select the nearest neighbours, where time is indirectly taken into account by the similarity measure, and a prediction function which is not disturbed in the presence of patterns at different levels of the time series. Both parameters were empirically evaluated on several artificial time series, including chaotic time series, as well as on a real time series related to several environmental variables from the Itaipu reservoir, made available by Itaipu Binacional. Three of the most correlated limnological variables were considered in the experiments carried out on the real time series: water temperature, air temperature and dissolved oxygen. Analyses of correlation were also accomplished to verify if the predicted variables values maintain similar correlation as the original ones. Results show that both proposals, the one related to the determination of the nearest neighbours as well as the one related to the prediction function, are promising Aprendizado de máquina Dados ambientais Funções de previsão Limnologia Previsão de dados temporais Seleção de vizinhos próximos Environmental data Limnology Machine learning Nearest neighbor selection Prediction functions Time series prediction

Search results

Advancing the Effectiveness of Non-Linear Dimensionality Reduction Techniques

MINIMIZING CONGESTION IN PEER-TO-PEER NETWORKS UNDER THE PRESENCE OF GUARDED NODES