Spelling suggestions: "subject:"k nearest neighbor""
11 |
AN ALL-ATTRIBUTES APPROACH TO SUPERVISED LEARNINGVANCE, DANNY W. January 2006 (has links)
No description available.
|
12 |
Identifying Interesting Posts on Social Media SitesSeethakkagari, Swathi, M.S. 21 September 2012 (has links)
No description available.
|
13 |
Spatial Analysis of Retinal Pigment Epithelium MorphologyHuang, Haitao 12 August 2016 (has links)
In patients with age-related macular degeneration, a monolayer of cells in the eyes called retinal pigment epithelium differ from healthy ones in morphology. It is therefore important to quantify the morphological changes, which will help us better understand the physiology, disease progression and classification. Classification of the RPE morphometry has been accomplished with whole tissue data. In this work, we focused on the spatial aspect of RPE morphometric analysis. We used the second-order spatial analysis to reveal the distinct patterns of cell clustering between normal and diseased eyes for both simulated and experimental human RPE data. We classified the mouse genotype and age by the k-Nearest Neighbors algorithm. Radially aligned regions showed different classification power for several cell shape variables. Our proposed methods provide a useful addition to classification and prognosis of eye disease noninvasively.
|
14 |
Predicting gene–phenotype associations in humans and other species from orthologous and paralogous phenotypesWoods, John Oates, III 21 February 2014 (has links)
Phenotypes and diseases may be related by seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and one member of the orthology relationship may be used to predict candidate genes for its counterpart. (There exists evidence of "paralogous phenotypes" as well, but validation is non-trivial.) In Chapter 2, I demonstrate the utility of including plant phenotypes in our database, and provide as an example the prediction of mammalian neural crest defects from an Arabidopsis thaliana phenotype, negative gravitropism defective. In the third chapter, I describe the incorporation of additional phenotypes into our database (including chicken, zebrafish, E. coli, and new C. elegans datasets). I present a method, developed in coordination with Martin Singh-Blom, for ranking predicted candidate genes by way of a k nearest neighbors naïve Bayes classifier drawing phenolog information from a variety of species. The fourth chapter relates to a computational method and application for identifying shared and overlapping pathways which contribute to phenotypes. I describe a method for rapidly querying a database of phenotype--gene associations for Boolean combinations of phenotypes which yields improved predictions. This method offers insight into the divergence of orthologous pathways in evolution. I demonstrate connections between breast cancer and zebrafish methylmercury response (through oxidative stress and apoptosis); human myopathy and plant red light response genes, minus those involved in water deprivation response (via autophagy); and holoprosencephaly and an array of zebrafish phenotypes. In the first appendix, I present the SciRuby Project, which I co-founded in order to bring scientific libraries to the Ruby programming language. I describe the motivation behind SciRuby and my role in its creation. Finally in Appendix B, I discuss the first beta release of NMatrix, a dense and sparse matrix library for the Ruby language, which I developed in part to facilitate and validate rapid phenolog searches. In this work, I describe the concept of phenologs as well as the development of the necessary computational tools for discovering phenotype orthology relationships, for predicting associated genes, and for statistically validating the discovered relationships and predicted associations. / text
|
15 |
Scaling out-of-core k-nearest neighbors computation on single machines / Faire passer à l'échelle le calcul "out-of-core" des K-plus proche voisins sur une seule machineOlivares, Javier 19 December 2016 (has links)
La technique des K-plus proches voisins (K-Nearest Neighbors (KNN) en Anglais) est une méthode efficace pour trouver des données similaires au sein d'un grand ensemble de données. Au fil des années, un grand nombre d'applications ont utilisé les capacités du KNN pour découvrir des similitudes dans des jeux de données de divers domaines tels que les affaires, la médecine, la musique, ou l'informatique. Bien que des années de recherche aient apporté plusieurs approches de cet algorithme, sa mise en œuvre reste un défi, en particulier aujourd'hui alors que les quantités de données croissent à des vitesses inimaginables. Dans ce contexte, l'exécution du KNN sur de grands ensembles pose deux problèmes majeurs: d'énormes empreintes mémoire et de très longs temps d'exécution. En raison de ces coût élevés en termes de ressources de calcul et de temps, les travaux de l'état de l'art ne considèrent pas le fait que les données peuvent changer au fil du temps, et supposent toujours que les données restent statiques tout au long du calcul, ce qui n'est malheureusement pas du tout conforme à la réalité. Nos contributions dans cette thèse répondent à ces défis. Tout d'abord, nous proposons une approche out-of-core pour calculer les KNN sur de grands ensembles de données en utilisant un seul ordinateur. Nous préconisons cette approche comme un moyen moins coûteux pour faire passer à l'échelle le calcul des KNN par rapport au coût élevé d'un algorithme distribué, tant en termes de ressources de calcul que de temps de développement, de débogage et de déploiement. Deuxièmement, nous proposons une approche out-of-core multithreadée (i.e. utilisant plusieurs fils d'exécution) pour faire face aux défis du calcul des KNN sur des données qui changent rapidement et continuellement au cours du temps. Après une évaluation approfondie, nous constatons que nos principales contributions font face aux défis du calcul des KNN sur de grands ensembles de données, en tirant parti des ressources limitées d'une machine unique, en diminuant les temps d'exécution par rapport aux performances actuelles, et en permettant le passage à l'échelle du calcul, à la fois sur des données statiques et des données dynamiques. / The K-Nearest Neighbors (KNN) is an efficient method to find similar data among a large set of it. Over the years, a huge number of applications have used KNN's capabilities to discover similarities within the data generated in diverse areas such as business, medicine, music, and computer science. Despite years of research have brought several approaches of this algorithm, its implementation still remains a challenge, particularly today where the data is growing at unthinkable rates. In this context, running KNN on large datasets brings two major issues: huge memory footprints and very long runtimes. Because of these high costs in terms of computational resources and time, KNN state-of the-art works do not consider the fact that data can change over time, assuming always that the data remains static throughout the computation, which unfortunately does not conform to reality at all. In this thesis, we address these challenges in our contributions. Firstly, we propose an out-of-core approach to compute KNN on large datasets, using a commodity single PC. We advocate this approach as an inexpensive way to scale the KNN computation compared to the high cost of a distributed algorithm, both in terms of computational resources as well as coding, debugging and deployment effort. Secondly, we propose a multithreading out-of-core approach to face the challenges of computing KNN on data that changes rapidly and continuously over time. After a thorough evaluation, we observe that our main contributions address the challenges of computing the KNN on large datasets, leveraging the restricted resources of a single machine, decreasing runtimes compared to that of the baselines, and scaling the computation both on static and dynamic datasets.
|
16 |
PREDICTION OF PUBLIC BUS TRANSPORTATION PLANNING BASED ON PASSENGER COUNT AND TRAFFIC CONDITIONSHeidaripak, Samrend January 2021 (has links)
Artificial intelligence has become a hot topic in the past couple of years because of its potential of solving problems. The most used subset of artificial intelligence today is machine learning, which is essentially the way a machine can learn to do tasks without getting any explicit instructions. A problem that has historically been solved by common knowledge and experience is the planning of bus transportation, which has been prone to mistakes. This thesis investigates how to extract the key features of a raw dataset and if a couple of machine learning algorithms can be applied to predict and plan the public bus transportation, while also considering the weather conditions. By using a pre-processing method to extract the features before creating and evaluating an k-nearest neighbors model as well as an artificial neural network model, predicting the passenger count on a given route could help planning of the bus transportation. The outcome of the thesis was that the feature extraction was successful, and both models could successfully predict the passenger count based on normal conditions. However, in extreme conditions such as the pandemic during 2020, the models could not be proven to successfully predict the passenger count nor being used to plan the bus transportation.
|
17 |
Efficient Algorithms for Data Mining with Federated DatabasesYoung, Barrington R. St. A. 03 July 2007 (has links)
No description available.
|
18 |
Predicting basketball performance based on draft pick : A classification analysisHarmén, Fredrik January 2022 (has links)
In this thesis, we will look to predict the performance of a basketball player coming into the NBA depending on where the player was picked in the NBA draft. This will be done by testing different machine learning models on data from the previous 35 NBA drafts and then comparing the models in order to see which model had the highest accuracy of classification. The machine learning methods used are Linear Discriminant Analysis, K-Nearest Neighbors, Support Vector Machines and Random Forests. The results show that the method with the highest accuracy of classification was Random Forests, with an accuracy of 42%.
|
19 |
Aplicação de classificadores para determinação de conformidade de biodiesel / Attesting compliance of biodiesel quality using classification methodsLOPES, Marcus Vinicius de Sousa 26 July 2017 (has links)
Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-09-04T17:47:07Z
No. of bitstreams: 1
MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5) / Made available in DSpace on 2017-09-04T17:47:07Z (GMT). No. of bitstreams: 1
MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5)
Previous issue date: 2017-07-26 / The growing demand for energy and the limitations of oil reserves have led to the
search for renewable and sustainable energy sources to replace, even partially, fossil fuels.
Biodiesel has become in last decades the main alternative to petroleum diesel. Its quality
is evaluated by given parameters and specifications which vary according to country or
region like, for example, in Europe (EN 14214), US (ASTM D6751) and Brazil (RANP
45/2014), among others. Some of these parameters are intrinsically related to the composition
of fatty acid methyl esters (FAMEs) of biodiesel, such as viscosity, density, oxidative
stability and iodine value, which allows to relate the behavior of these properties with the
size of the carbon chain and the presence of unsaturation in the molecules. In the present
work four methods for direct classification (support vector machine, K-nearest neighbors,
decision tree classifier and artificial neural networks) were optimized and compared to
classify biodiesel samples according to their compliance to viscosity, density, oxidative
stability and iodine value, having as input the composition of fatty acid methyl esters,
since those parameters are intrinsically related to composition of biodiesel. The classifi-
cations were carried out under the specifications of standards EN 14214, ASTM D6751
and RANP 45/2014. A comparison between these methods of direct classification and empirical
equations (indirect classification) distinguished positively the direct classification
methods in the problem addressed, especially when the biodiesel samples have properties
values very close to the limits of the considered specifications. / A demanda crescente por fontes de energia renováveis e como alternativa aos combustíveis
fósseis tornam o biodiesel como uma das principais alternativas para substituição dos derivados do petróleo. O controle da qualidade do biodiesel durante processo de
produção e distribuição é extremamente importante para garantir um combustível com
qualidade confiável e com desempenho satisfatório para o usuário final. O biodiesel é
caracterizado pela medição de determinadas propriedades de acordo com normas internacionais.
A utilização de métodos de aprendizagem de máquina para a caracterização do
biodiesel permite economia de tempo e dinheiro. Neste trabalho é mostrado que para a
determinação da conformidade de um biodiesel os classificadores SVM, KNN e Árvore de
decisões apresentam melhores resultados que os métodos de predição de trabalhos anteriores.
Para as propriedades de viscosidade densidade, índice de iodo e estabilidade oxidativa
(RANP 45/2014, EN14214:2014 e ASTM D6751-15) os classificadores KNN e Árvore de
decisões apresentaram-se como melhores opções. Estes resultados mostram que os classificadores
podem ser aplicados de forma prática visando economia de tempo, recursos
financeiros e humanos.
|
20 |
Construction of the Intensity-Duration-Frequency (IDF) Curves under Climate Change2014 December 1900 (has links)
Intensity-Duration-Frequency (IDF) curves are among the standard design tools for various engineering applications, such as storm water management systems. The current practice is to use IDF curves based on historical extreme precipitation quantiles. A warming climate, however, might change the extreme precipitation quantiles represented by the IDF curves, emphasizing the need for updating the IDF curves used for the design of urban storm water management systems in different parts of the world, including Canada.
This study attempts to construct the future IDF curves for Saskatoon, Canada, under possible climate change scenarios. For this purpose, LARS-WG, a stochastic weather generator, is used to spatially downscale the daily precipitation projected by Global Climate Models (GCMs) from coarse grid resolution to the local point scale. The stochastically downscaled daily precipitation realizations were further disaggregated into ensemble hourly and sub-hourly (as fine as 5-minute) precipitation series, using a disaggregation scheme developed using the K-nearest neighbor (K-NN) technique. This two-stage modeling framework (downscaling to daily, then disaggregating to finer resolutions) is applied to construct the future IDF curves in the city of Saskatoon. The sensitivity of the K-NN disaggregation model to the number of nearest neighbors (i.e. window size) is evaluated during the baseline period (1961-1990). The optimal window size is assigned based on the performance in reproducing the historical IDF curves by the K-NN disaggregation models. Two optimal window sizes are selected for the K-NN hourly and sub-hourly disaggregation models that would be appropriate for the hydrological system of Saskatoon. By using the simulated hourly and sub-hourly precipitation series and the Generalized Extreme Value (GEV) distribution, future changes in the IDF curves and associated uncertainties are quantified using a large ensemble of projections obtained for the Canadian and British GCMs (CanESM2 and HadGEM2-ES) based on three Representative Concentration Pathways; RCP2.6, RCP4.5, and RCP8.5 available from CMIP5 – the most recent product of the Intergovernmental Panel on Climate Change (IPCC). The constructed IDF curves are then compared with the ones constructed using another method based on a genetic programming technique.
The results show that the sign and the magnitude of future variations in extreme precipitation quantiles are sensitive to the selection of GCMs and/or RCPs, and the variations seem to become intensified towards the end of the 21st century. Generally, the relative change in precipitation intensities with respect to the historical intensities for CMIP5 climate models (e.g., CanESM2: RCP4.5) is less than those for CMIP3 climate models (e.g., CGCM3.1: B1), which may be due to the inclusion of climate policies (i.e., adaptation and mitigation) in CMIP5 climate models. The two-stage downscaling-disaggregation method enables quantification of uncertainty due to natural internal variability of precipitation, various GCMs and RCPs, and downscaling methods. In general, uncertainty in the projections of future extreme precipitation quantiles increases for short durations and for long return periods. The two-stage method adopted in this study and the GP method reconstruct the historical IDF curves quite successfully during the baseline period (1961-1990); this suggests that these methods can be applied to efficiently construct IDF curves at the local scale under future climate scenarios. The most notable precipitation intensification in Saskatoon is projected to occur with shorter storm duration, up to one hour, and longer return periods of more than 25 years.
|
Page generated in 0.0493 seconds