11 |
A Classification Algorithm Using Mahalanobis Distance Clustering Of Data With Applications On Biomedical Data SetsDurak, Bahadir 01 January 2011 (has links) (PDF)
The concept of classification is used and examined by the scientific community
for hundreds of years. In this historical process, different methods and algorithms
have been developed and used.
Today, although the classification algorithms in literature use different methods,
they are acting on a similar basis. This basis is setting the desired data into classes
by using defined properties, with a different discourse / an effort to establish a
relationship between known features with unknown result. This study was
intended to bring a different perspective to this common basis.
In this study, not only the basic features of data are used, the class of the data is
also included as a parameter. The aim of this method is also using the information
in the algorithm that come from a known value. In other words, the class, in which
the data is included, is evaluated as an input and the data set is transferred to a
higher dimensional space which is a new working environment. In this new
environment it is not a classification problem anymore, but a clustering problem.
Although this logic is similar with Kernel Methods, the methodologies are
different from the way that how they transform the working space. In the
projected new space, the clusters based on calculations performed with the
Mahalanobis Distance are evaluated in original space with two different heuristics
which are center-based and KNN-based algorithm. In both heuristics, increase in
classification success rates achieved by this methodology. For center based
algorithm, which is more sensitive to new input parameter, up to 8% of
enhancement is observed.
|
12 |
DISTRIBUTION OF EASTERN HEMLOCK, TSUGA CANADENSIS, IN EASTERN KENTUCKY AND THE SUSCEPTIBILITY TO INVASION BY THE HEMLOCK WOOLLY ADELGID, ADELGES TSUGAEClark, Joshua Taylor 01 January 2010 (has links)
The hemlock woolly adelgid, an invasive non-native insect, is threatening eastern hemlock in Kentucky. This study examined three techniques to map the distribution of eastern hemlock using decision trees, remote sensing, and species distribution modeling. Accuracy assessments showed that eastern hemlock was best modeled using a decision tree without incorporating satellite radiance. Using the distribution from the optimal model, risk maps for susceptibility to hemlock woolly adelgid infestation were created using two species distribution models. Environmental variables related to dispersal were used to build the models and their contributions to the models assessed. The models showed similar spatial distributions of eastern hemlock at high risk of infestation.
|
13 |
Interação genótipos por épocas de semeadura de feijoeiro comum em relação a doenças foliares em cerrados de baixa altitude /Rossetto, João Édino. January 2018 (has links)
Orientador: Bruno Ettore Pavan / Resumo: O Feijão comum (Phaseolus vulgaris L.) é uma das principais fontes alimentares no Brasil, agregado tanto por valores culturais como nutricionais. Seu cultivo pode ser encontrado pequenos e grandes produtores, e em diferentes níveis tecnológicos, e se estende por todo o território Nacional. O potencial produtivo do feijoeiro está em muito ligada a sanidade de plantas, tendo os patógenos como os principais responsáveis pelas quedas em produção. O trabalho objetivou verificar a interação genótipo x ambiente, procedendo com a estratificação ambiental de épocas de semeadura afim de recomendar a melhor época que possibilite a discriminação entre os genótipos e a seleção dos genótipos mais adaptados e estáveis em relação ao ataque de Mancha Angular e Crestamento Bacteriano Comum em cerrado de baixa altitude. Os experimentos foram conduzidos no período de: Junho, Agosto, Outubro e Dezembro de 2015 e Março e Abril de 2016, na Fazenda de Ensino, Pesquisa e Extensão da Faculdade de Engenharia de Ilha Solteira (FEIS), situada no município de Selvíria-MS. O delineamento experimental adotado foi em blocos casualizados, onde foram usados 20 genótipos, sendo 5 deles comerciais, IAC – Una, IAC – Imperador, IAC – Formoso, IAC – Milênio, IAC – Alvorada; e 15 provenientes do programa de melhoramento da FEIS. Para a fonte de variação “ambiente” foram utilizadas as seis épocas de semeadura. Os caracteres avaliados foram: incidência de Crestamento Bacteriano Comum e Mancha Angular. Os parâmetros ge... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Common Bean (Phaseolus vulgaris L.) is one of the main food sources in Brazil, aggregated both by cultural and nutritional values. Its cultivation can be found both small and large producers, and at different technological levels, and extends throughout the national territory. The productive potential of the bean plant is closely related to plant health, with pathogens being the main cause of falls in production. The objective of this work was to verify the genotype x environment interaction, proceeding with the environmental stratification of sowing times in order to recommend the best season that allows discrimination between the genotypes and the selection of the most adapted and stable genotypes in relation to the attack of angular spot and blight bacterial in cerrado of low altitude. The experiments were conducted in the period of: June, August, October and December of 2015 and March and April of 2016, in the Fazenda de Ensino, Pesquisa e Extensão da Faculdade de Engenharia de Ilha Solteira (FEIS), located in the municipality of Selvíria-MS . The experimental design was randomized blocks, where 20 genotypes were used, 5 of them commercial, IAC - Una, IAC - Imperador, IAC - Formoso, IAC - Milênio, IAC - Alvorada; and 15 from the FEIS breeding program. For the "environment" variation source, the six sowing times were used. The evaluated characters were: incidence of Bacterial and Angular Spotting. The genetic parameters and variance components were obtained by the REML / B... (Complete abstract click electronic access below) / Mestre
|
14 |
Methods in the Assessment of Genotype-Phenotype Correlations in Rare Childhood Disease Through Orthogonal Multi-omics, High-throughput Sequencing ApproachesJanuary 2015 (has links)
abstract: Rapid advancements in genomic technologies have increased our understanding of rare human disease. Generation of multiple types of biological data including genetic variation from genome or exome, expression from transcriptome, methylation patterns from epigenome, protein complexity from proteome and metabolite information from metabolome is feasible. "Omics" tools provide comprehensive view into biological mechanisms that impact disease trait and risk. In spite of available data types and ability to collect them simultaneously from patients, researchers still rely on their independent analysis. Combining information from multiple biological data can reduce missing information, increase confidence in single data findings, and provide a more complete view of genotype-phenotype correlations. Although rare disease genetics has been greatly improved by exome sequencing, a substantial portion of clinical patients remain undiagnosed. Multiple frameworks for integrative analysis of genomic and transcriptomic data are presented with focus on identifying functional genetic variations in patients with undiagnosed, rare childhood conditions. Direct quantitation of X inactivation ratio was developed from genomic and transcriptomic data using allele specific expression and segregation analysis to determine magnitude and inheritance mode of X inactivation. This approach was applied in two families revealing non-random X inactivation in female patients. Expression based analysis of X inactivation showed high correlation with standard clinical assay. These findings improved understanding of molecular mechanisms underlying X-linked disorders. In addition multivariate outlier analysis of gene and exon level data from RNA-seq using Mahalanobis distance, and its integration of distance scores with genomic data found genotype-phenotype correlations in variant prioritization process in 25 families. Mahalanobis distance scores revealed variants with large transcriptional impact in patients. In this dataset, frameshift variants were more likely result in outlier expression signatures than other types of functional variants. Integration of outlier estimates with genetic variants corroborated previously identified, presumed causal variants and highlighted new candidate in previously un-diagnosed case. Integrative genomic approaches in easily attainable tissue will facilitate the search for biomarkers that impact disease trait, uncover pharmacogenomics targets, provide novel insight into molecular underpinnings of un-characterized conditions, and help improve analytical approaches that use large datasets. / Dissertation/Thesis / Doctoral Dissertation Molecular and Cellular Biology 2015
|
15 |
Interação genótipos por épocas de semeadura de feijoeiro comum em relação a doenças foliares em cerrados de baixa altitude / Interaction of genotypes by season times of common bean in relation to foliary diseases in closures of low altitudeRossetto, João Édino 22 February 2018 (has links)
Submitted by João Édino Rossetto null (jerossetto@terra.com.br) on 2018-04-20T13:56:04Z
No. of bitstreams: 1
Dissertação_JoaoEdinoRossetto.pdf: 1124751 bytes, checksum: 62ba748ae485faac54f96efa6f360817 (MD5) / Approved for entry into archive by Cristina Alexandra de Godoy null (cristina@adm.feis.unesp.br) on 2018-04-20T14:15:24Z (GMT) No. of bitstreams: 1
rossetto_je_me_ilha.pdf: 1124751 bytes, checksum: 62ba748ae485faac54f96efa6f360817 (MD5) / Made available in DSpace on 2018-04-20T14:15:24Z (GMT). No. of bitstreams: 1
rossetto_je_me_ilha.pdf: 1124751 bytes, checksum: 62ba748ae485faac54f96efa6f360817 (MD5)
Previous issue date: 2018-02-22 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O Feijão comum (Phaseolus vulgaris L.) é uma das principais fontes alimentares no Brasil, agregado tanto por valores culturais como nutricionais. Seu cultivo pode ser encontrado pequenos e grandes produtores, e em diferentes níveis tecnológicos, e se estende por todo o território Nacional. O potencial produtivo do feijoeiro está em muito ligada a sanidade de plantas, tendo os patógenos como os principais responsáveis pelas quedas em produção. O trabalho objetivou verificar a interação genótipo x ambiente, procedendo com a estratificação ambiental de épocas de semeadura afim de recomendar a melhor época que possibilite a discriminação entre os genótipos e a seleção dos genótipos mais adaptados e estáveis em relação ao ataque de Mancha Angular e Crestamento Bacteriano Comum em cerrado de baixa altitude. Os experimentos foram conduzidos no período de: Junho, Agosto, Outubro e Dezembro de 2015 e Março e Abril de 2016, na Fazenda de Ensino, Pesquisa e Extensão da Faculdade de Engenharia de Ilha Solteira (FEIS), situada no município de Selvíria-MS. O delineamento experimental adotado foi em blocos casualizados, onde foram usados 20 genótipos, sendo 5 deles comerciais, IAC – Una, IAC – Imperador, IAC – Formoso, IAC – Milênio, IAC – Alvorada; e 15 provenientes do programa de melhoramento da FEIS. Para a fonte de variação “ambiente” foram utilizadas as seis épocas de semeadura. Os caracteres avaliados foram: incidência de Crestamento Bacteriano Comum e Mancha Angular. Os parâmetros genéticos e componentes de variância foram obtidos pelo procedimento REML/BLUP. As metodologias MHPRVG (Média Harmônica da Performance Relativa dos Valores Genéticos), AMMI (Additive Main effects and Multiplicative Interaction) e Dendograma baseado na distância de Mahalanobis foram utilizadas para estudo estratificação ambiental, estabilidade e adaptabilidade. Os resultados apontaram que houve interação entre genótipos e ambientes, gerando ambientes favoráveis e não favoráveis a incidência de Mancha Angular e Crestamento Bacteriano Comum. Foi possível detectar os ambientes (épocas) que proporcionaram boa discriminação dos genótipos (A1, junho de 2015 e A6, abril de 2016) e os melhores genótipos para estabilidade e tolerância simultânea (G11 e G5). / Common Bean (Phaseolus vulgaris L.) is one of the main food sources in Brazil, aggregated both by cultural and nutritional values. Its cultivation can be found both small and large producers, and at different technological levels, and extends throughout the national territory. The productive potential of the bean plant is closely related to plant health, with pathogens being the main cause of falls in production. The objective of this work was to verify the genotype x environment interaction, proceeding with the environmental stratification of sowing times in order to recommend the best season that allows discrimination between the genotypes and the selection of the most adapted and stable genotypes in relation to the attack of angular spot and blight bacterial in cerrado of low altitude. The experiments were conducted in the period of: June, August, October and December of 2015 and March and April of 2016, in the Fazenda de Ensino, Pesquisa e Extensão da Faculdade de Engenharia de Ilha Solteira (FEIS), located in the municipality of Selvíria-MS . The experimental design was randomized blocks, where 20 genotypes were used, 5 of them commercial, IAC - Una, IAC - Imperador, IAC - Formoso, IAC - Milênio, IAC - Alvorada; and 15 from the FEIS breeding program. For the "environment" variation source, the six sowing times were used. The evaluated characters were: incidence of Bacterial and Angular Spotting. The genetic parameters and variance components were obtained by the REML / BLUP procedure. The methodologies MHPRVG (Harmonic Mean of Relative Performance of Genetic Values), AMMI (Additive Main effects and Multiplicative Interaction) and Dendogram based on Mahalanobis distance were used to study environmental stratification, stability and adaptability. The results indicated that there was interaction between genotypes and environments, generating favorable environments and not favoring the incidence of Angular Spot and Bacterial Crust. It was possible to detect the environments (seasons) that provided good discrimination of the genotypes (A1, June 2015 and A6, April 2016) and the best genotypes for stability and simultaneous tolerance (G11 and G5).
|
16 |
Divergência genética entre acessos de açafrão (Curcuma longa L.) utilizando caracteres morfoagronômicos / Genetic divergence among genotypes of turmeric (Curcuma longa L.) using morphological and agronomic charactersCintra, Maria Mônica Domingues Franco 20 May 2005 (has links)
Submitted by Erika Demachki (erikademachki@gmail.com) on 2014-11-19T18:42:25Z
No. of bitstreams: 2
Dissertação - Maria Mônica Domingues Franco Cintra - 2005.pdf: 1037835 bytes, checksum: 80236bdcbef3b93f40c18480bbef4a73 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Erika Demachki (erikademachki@gmail.com) on 2014-11-19T18:43:06Z (GMT) No. of bitstreams: 2
Dissertação - Maria Mônica Domingues Franco Cintra - 2005.pdf: 1037835 bytes, checksum: 80236bdcbef3b93f40c18480bbef4a73 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2014-11-19T18:43:06Z (GMT). No. of bitstreams: 2
Dissertação - Maria Mônica Domingues Franco Cintra - 2005.pdf: 1037835 bytes, checksum: 80236bdcbef3b93f40c18480bbef4a73 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Previous issue date: 2005-05-20 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Two experiments were conducted in the experimental area of the Universidade
Federal de Goiás for assessing the genetic divergence among 21 genotypes of turmeric
cultivation in two years (EI) and 33 genotypes in cultivation of one year (EII) based on
multivariate analyses, order to select divergent genotypes and find out the Mara Rosa
producers use the same genotype. The accessions came from Goiás, Minas Gerais and São
Paulo. The experiments were conducted in the 2001/2003 period (EI) and 2003/2004 (EII)
using a randomized block design with four replications. The assessment of EI was based
on production descriptors as total wet weight of rhizomes, dry weight and content of
curcumin. The assessment of EII was based on shoot descriptors as tiller number, leaf
number, leaf area, height and also production descriptors. Data were subjected to analysis
of variance and means were compared by the Scott-Knott test at 1% and 5% probability.
We used the Mahalanobis distance as the dissimilarity measure, and for groups delineation
the method of Tocher. All analyzes were performed using the program GENES. In the
analyzes it can be concluded that Mara Rosa farmers did not use the same genotype and
there is variability in the selection of genotypes. The evaluated characteristics are strongly
correlated, which justifies the use of measures of dissimilarities using the Mahalanobis
distance. Multivariate techniques were effective for the study of genetic diversity and
separated the accessions into groups. Methods of estimation of genetic diversity in
turmeric accessions were equivalent. Curcumin content and dry weight were descriptors
that contributed to divergence in EI. In EII were plants number and tillers number. It was
found the increase in curcumin levels of after two years of cultivation. By Mahalanobis
distances could be indicated the five most productive genotypes for breeding program and
divergence genetic analysis showed the genotypes more indicated for hybridizations. / Conduziu-se dois experimentos na área experimental da Universidade Federal
de Goiás para a avaliação da divergência genética entre 21 genótipos de açafrão em cultivo
de dois anos (EI) e de 33 genótipos em cultivo de um ano (EII) com base em
procedimentos multivariados, visando selecionar genótipos divergentes e mais produtivos e
definir se os produtores de Mara Rosa utilizam um mesmo genótipo. Os acessos são de
Goiás, Minas Gerais e São Paulo. Os experimentos foram conduzidos no período de
2001/2003 (EI) e 2003/2004 (EII) utilizando delineamento de blocos ao acaso com quatro
repetições. A avaliação de EI foi baseada em descritores agronômicos relacionados à
produção como: peso fresco total dos rizomas, peso seco, teor de curcumina, entre outros.
A avaliação de EII foi baseada em descritores da parte aérea ( número de perfilhos, número
de folhas, área foliar, altura média, entre outros) e também nos descritores de produção.
Dados obtidos foram submetidos à análise de variância e as médias comparadas pelo teste
de Scott-Knott a 1% e 5% de probabilidade. Utilizou-se a distância generalizada de
Mahalanobis como medida de dissimilaridade e, na delimitação dos grupos, o método de
otimização de Tocher. Todas as análises foram realizadas utilizando o Programa GENES.
Nas análises pode-se concluir que agricultores de Mara Rosa não utilizam um mesmo
genótipo e há variabilidade para a seleção de genótipos. As características avaliadas são
fortemente correlacionadas, o que justifica o uso das medidas de dissimilaridades usando a
distância de Mahalanobis. Técnicas multivariadas foram eficientes para o estudo da
divergência genética e permitiram a separação dos acessos em grupos. Métodos de
estimação da divergência genética em acessos de açafrão, através das distâncias
generalizadas de Mahalanobis ou das variáveis canônicas foram equivalentes. Teor de
curcumina e peso seco foram os descritores que mais contribuíram para a divergência
genética em EI. Em EII foram número de plantas e número de perfilhos. Constatou-se o
incremento no teor de curcumina após dois anos de cultivo. Pelas médias e distâncias de
Mahalanobis pôde-se indicar os cinco genótipos mais produtivos para o programa de
melhoramento e resultados da análise de divergência mostraram os genótipos mais
indicados para futuras hibridizações.
|
17 |
Echo of the Ancients: Evolution of Song in the Avian Family Cettiidae / Röster från forntiden: evolution av sång inom fågelfamiljen CettiidaeGoodstadt, Jared January 2022 (has links)
The Cettiidae, a family of primarily small, insectivorous, Asiatic and Austronesian, mountain birds have been the subject of acoustic analysis in the past. However, until this point, an in-depth review of the songs of the entire family had yet to be undertaken. In an effort to resolve this shortcoming, the songs of 29 Cettiidae species were examined through the usage of acoustic analysis software, with specific factors such as bandwidth, frequency, and strophe duration being statistically recorded. In total 286 individuals and over 800 strophes were analyzed, with the collected data being displayed in various PCA plots. These PCA graphs were then compared to both a dated phylogenetic tree specifically created for this study, and a Mahalanobis distance vs. genetic distance plot, created using the acoustic data as well as Cytochrome b genetic data. Based on these plots, several notable trends could be observed across the entire family. While largescale divergence from the norm was noted in several pairwise comparisons of species, as well as large scale conservation within clades such as the island Horornis species, examples of convergent evolution of their songs was rather scant. It was also noted that despite the strong divergence of certain species, each genus occupied its own area of multivariate space within the PCAs. Strong statistical divergence between island and continental species was also noted in both the PCAs and the Mahalanobis graph. Meanwhile, the statistical analysis of these species unfortunately provided no clues as to the ancestral state of their songs. However, a visual analysis of every species song, mapped on the dated phylogenetic tree, suggested that two distinct linages of simple and complex songs could be traced back approximately 10 million years. This allows for speculation as to the songs of now long extinct Cettiidae species as far back as the Miocene.
|
18 |
Unsupervised Online Anomaly Detection in Multivariate Time-Series / Oövervakad online-avvikelsedetektering i flerdimensionella tidsserierSegerholm, Ludvig January 2023 (has links)
This research aims to identify a method for unsupervised online anomaly detection in multivariate time series in dynamic systems in general and on the case study of Devwards IoT-system in particular. A requirement of the solution is its explainability, online learning and low computational expense. A comprehensive literature review was conducted, leading to the experimentation and analysis of various anomaly detection approaches. Of the methods evaluated, a singular recurrent neural network autoencoder emerged as the most promising, emphasizing a simple model structure that encourages stable performance with consistent outputs, regardless of the average output. While other approaches such as Hierarchical Temporal Memory models and an ensemble strategy of adaptive model pooling yielded suboptimal results. A modified version of the Residual Explainer method for enhancing explainability in autoencoders for online scenarios showed promising outcomes. The use of Mahalanobis distance for anomaly detection was explored. Feature extraction and it's implications in the context of the proposed approach is explored. Conclusively, a single, streamlined recurrent neural network appears to be the superior approach for this application, though further investigation into online learning methods is warranted. The research contributes results into the field of unsupervised online anomaly detection in multivariate time series and contributes to the Residual Explainer method for online autoencoders. Additionally, it offers data on the ineffectiveness of the Mahalanobis distance in an online anomaly detection environment.
|
19 |
Evaluating Long-Term Land Cover Changes for Malheur Lake, Oregon Using ENVI and ArcGISWoods, Ryan Joseph 01 December 2015 (has links)
Land cover change over time can be a useful indicator of variations in a watershed, such as the patterns of drought in an area. I present a case study using remotely sensed images from Landsat satellites for over a 30-year period to generate classifications representing land cover categories, which I use to quantify land cover change in the watershed areas that contribute to Malheur, Mud, and Harney Lakes. I selected images, about every 4 to 6 years from late June to late July, in an attempt to capture the peak vegetation growth and to avoid cloud cover. Complete coverage of the watershed required that I selected an image that included the lakes, an image to the North, and an image to the West of the lakes to capture the watershed areas for each chosen year. I used the watershed areas defined by the HUC-8 shapefiles. The relevant watersheds are called: Harney-Malheur Lakes, Donner und Blitzen, Silver, and Silvies watershed. To summarize the land cover classes that could be discriminated from the Landsat images in the area, I used an unsupervised classification algorithm called Iterative Self-Organizing Data Analysis Technique (ISODATA) to identify different classes from the pixels. I then used the ISODATA results and visual inspection of calibrated Landsat images and Google Earth imagery, to create Regions of Interest (ROI) with the following land cover classes: Water, Shallow Water, Vegetation, Dark Vegetation, Salty Area, and Bare Earth. The ROIs were used in the following supervised classification algorithms: maximum likelihood, minimum distance, and Mahalanobis distance, to classify land cover for the area. Using ArcGIS, I removed most of the misclassified area from the classified images by the use of the Landsat CDR, combined the main, north, and west images and then extracted the watersheds from the combined image. The area in acres for each land cover class and watershed was computed and stored in graphs and tables.After comparing the three supervised classifications using the amount of area classified into each category, normalized area in each category, and the raster datasets, I determined that the minimum distance classification algorithm produced the most accurate land cover classification. I investigated the correlation of the land cover classes with the average precipitation, average discharge, average summer high temperature, and drought indicators. For the most part, the land cover changes correlate with the weather. However, land use changes, groundwater, and error in the land cover classes may have accounted for the instances of discrepancy. The correlation of land cover classes, except Dark Vegetation and Bare Earth, are statistically significant with weather data. This study shows that Landsat imagery has the necessary components to create and track land cover changes over time. These results can be useful in hydrological studies and can be applied to models.
|
20 |
Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis DistancePathirana, Vindya Kumari 01 January 2015 (has links)
Foreign exchange (FX) rate forecasting has been a challenging area of study in the past. Various linear and nonlinear methods have been used to forecast FX rates. As the currency data are nonlinear and highly correlated, forecasting through nonlinear dynamical systems is becoming more relevant. The nearest neighbor (NN) algorithm is one of the most commonly used nonlinear pattern recognition and forecasting methods that outperforms the available linear forecasting methods for the high frequency foreign exchange data. The basic idea behind the NN is to capture the local behavior of the data by selecting the instances having similar dynamic behavior. The most relevant k number of histories to the present dynamical structure are the only past values used to predict the future. Due to this reason, NN algorithm is also known as the k-nearest neighbor algorithm (k-NN). Here k represents the number of chosen neighbors.
In the k-nearest neighbor forecasting procedure, similar instances are captured through a distance function. Since the forecasts completely depend on the chosen nearest neighbors, the distance plays a key role in the k-NN algorithm. By choosing an appropriate distance, we can improve the performance of the algorithm significantly. The most commonly used distance for k-NN forecasting in the past was the Euclidean distance. Due to possible correlation among vectors at different time frames, distances based on deterministic vectors, such as Euclidean, are not very appropriate when applying for foreign exchange data. Since Mahalanobis distance captures the correlations, we suggest using this distance in the selection of neighbors.
In the present study, we used five different foreign currencies, which are among the most traded currencies, to compare the performances of the k-NN algorithm with traditional Euclidean and Absolute distances to performances with the proposed Mahalanobis distance. The performances were compared in two ways: (i) forecast accuracy and (ii) transforming their forecasts in to a more effective technical trading rule. The results were obtained with real FX trading data, and the results showed that the method introduced in this work outperforms the other popular methods.
Furthermore, we conducted a thorough investigation of optimal parameter choice with different distance measures. We adopted the concept of distance based weighting to the NN and compared the performances with traditional unweighted NN algorithm based forecasting.
Time series forecasting methods, such as Auto regressive integrated moving average process (ARIMA), are widely used in many ares of time series as a forecasting technique. We compared the performances of proposed Mahalanobis distance based k-NN forecasting procedure with the traditional general ARIM- based forecasting algorithm. In this case the forecasts were also transformed into a technical trading strategy to create buy and sell signals. The two methods were evaluated for their forecasting accuracy and trading performances.
Multi-step ahead forecasting is an important aspect of time series forecasting. Even though many researchers claim that the k-Nearest Neighbor forecasting procedure outperforms the linear forecasting methods for financial time series data, and the available work in the literature supports this claim with one step ahead forecasting. One of our goals in this work was to improve FX trading with multi-step ahead forecasting. A popular multi-step ahead forecasting strategy was adopted in our work to obtain more than one day ahead forecasts. We performed a comparative study on the performance of single step ahead trading strategy and multi-step ahead trading strategy by using five foreign currency data with Mahalanobis distance based k-nearest neighbor algorithm.
|
Page generated in 0.3274 seconds