Global ETD Search

11	AN ALL-ATTRIBUTES APPROACH TO SUPERVISED LEARNING VANCE, DANNY W. January 2006 (has links) No description available. machine learning supervised learning support vector machine k-nearest neighbors decision tree SVM KNN CART C4.5
12	Identifying Interesting Posts on Social Media Sites Seethakkagari, Swathi, M.S. 21 September 2012 (has links) No description available. Computer Science Social networks k-nearest neighbors Naive Bayes Classi- fication Confusion Matrix
13	Spatial Analysis of Retinal Pigment Epithelium Morphology Huang, Haitao 12 August 2016 (has links) In patients with age-related macular degeneration, a monolayer of cells in the eyes called retinal pigment epithelium differ from healthy ones in morphology. It is therefore important to quantify the morphological changes, which will help us better understand the physiology, disease progression and classification. Classification of the RPE morphometry has been accomplished with whole tissue data. In this work, we focused on the spatial aspect of RPE morphometric analysis. We used the second-order spatial analysis to reveal the distinct patterns of cell clustering between normal and diseased eyes for both simulated and experimental human RPE data. We classified the mouse genotype and age by the k-Nearest Neighbors algorithm. Radially aligned regions showed different classification power for several cell shape variables. Our proposed methods provide a useful addition to classification and prognosis of eye disease noninvasively. Retinal pigment epithelium Age-related macular degeneration Cell morphometric data Spatial analysis k-Nearest Neighbors algorithm Classification
14	Predicting gene–phenotype associations in humans and other species from orthologous and paralogous phenotypes Woods, John Oates, III 21 February 2014 (has links) Phenotypes and diseases may be related by seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and one member of the orthology relationship may be used to predict candidate genes for its counterpart. (There exists evidence of "paralogous phenotypes" as well, but validation is non-trivial.) In Chapter 2, I demonstrate the utility of including plant phenotypes in our database, and provide as an example the prediction of mammalian neural crest defects from an Arabidopsis thaliana phenotype, negative gravitropism defective. In the third chapter, I describe the incorporation of additional phenotypes into our database (including chicken, zebrafish, E. coli, and new C. elegans datasets). I present a method, developed in coordination with Martin Singh-Blom, for ranking predicted candidate genes by way of a k nearest neighbors naïve Bayes classifier drawing phenolog information from a variety of species. The fourth chapter relates to a computational method and application for identifying shared and overlapping pathways which contribute to phenotypes. I describe a method for rapidly querying a database of phenotype--gene associations for Boolean combinations of phenotypes which yields improved predictions. This method offers insight into the divergence of orthologous pathways in evolution. I demonstrate connections between breast cancer and zebrafish methylmercury response (through oxidative stress and apoptosis); human myopathy and plant red light response genes, minus those involved in water deprivation response (via autophagy); and holoprosencephaly and an array of zebrafish phenotypes. In the first appendix, I present the SciRuby Project, which I co-founded in order to bring scientific libraries to the Ruby programming language. I describe the motivation behind SciRuby and my role in its creation. Finally in Appendix B, I discuss the first beta release of NMatrix, a dense and sparse matrix library for the Ruby language, which I developed in part to facilitate and validate rapid phenolog searches. In this work, I describe the concept of phenologs as well as the development of the necessary computational tools for discovering phenotype orthology relationships, for predicting associated genes, and for statistically validating the discovered relationships and predicted associations. / text Deep homology Phenologs Phenotype orthology Phenotype paralogy Homology Gene--phenotype associations Ruby Sciruby Nmatrix k nearest neighbors
15	Scaling out-of-core k-nearest neighbors computation on single machines / Faire passer à l'échelle le calcul "out-of-core" des K-plus proche voisins sur une seule machine Olivares, Javier 19 December 2016 (has links) La technique des K-plus proches voisins (K-Nearest Neighbors (KNN) en Anglais) est une méthode efficace pour trouver des données similaires au sein d'un grand ensemble de données. Au fil des années, un grand nombre d'applications ont utilisé les capacités du KNN pour découvrir des similitudes dans des jeux de données de divers domaines tels que les affaires, la médecine, la musique, ou l'informatique. Bien que des années de recherche aient apporté plusieurs approches de cet algorithme, sa mise en œuvre reste un défi, en particulier aujourd'hui alors que les quantités de données croissent à des vitesses inimaginables. Dans ce contexte, l'exécution du KNN sur de grands ensembles pose deux problèmes majeurs: d'énormes empreintes mémoire et de très longs temps d'exécution. En raison de ces coût élevés en termes de ressources de calcul et de temps, les travaux de l'état de l'art ne considèrent pas le fait que les données peuvent changer au fil du temps, et supposent toujours que les données restent statiques tout au long du calcul, ce qui n'est malheureusement pas du tout conforme à la réalité. Nos contributions dans cette thèse répondent à ces défis. Tout d'abord, nous proposons une approche out-of-core pour calculer les KNN sur de grands ensembles de données en utilisant un seul ordinateur. Nous préconisons cette approche comme un moyen moins coûteux pour faire passer à l'échelle le calcul des KNN par rapport au coût élevé d'un algorithme distribué, tant en termes de ressources de calcul que de temps de développement, de débogage et de déploiement. Deuxièmement, nous proposons une approche out-of-core multithreadée (i.e. utilisant plusieurs fils d'exécution) pour faire face aux défis du calcul des KNN sur des données qui changent rapidement et continuellement au cours du temps. Après une évaluation approfondie, nous constatons que nos principales contributions font face aux défis du calcul des KNN sur de grands ensembles de données, en tirant parti des ressources limitées d'une machine unique, en diminuant les temps d'exécution par rapport aux performances actuelles, et en permettant le passage à l'échelle du calcul, à la fois sur des données statiques et des données dynamiques. / The K-Nearest Neighbors (KNN) is an efficient method to find similar data among a large set of it. Over the years, a huge number of applications have used KNN's capabilities to discover similarities within the data generated in diverse areas such as business, medicine, music, and computer science. Despite years of research have brought several approaches of this algorithm, its implementation still remains a challenge, particularly today where the data is growing at unthinkable rates. In this context, running KNN on large datasets brings two major issues: huge memory footprints and very long runtimes. Because of these high costs in terms of computational resources and time, KNN state-of the-art works do not consider the fact that data can change over time, assuming always that the data remains static throughout the computation, which unfortunately does not conform to reality at all. In this thesis, we address these challenges in our contributions. Firstly, we propose an out-of-core approach to compute KNN on large datasets, using a commodity single PC. We advocate this approach as an inexpensive way to scale the KNN computation compared to the high cost of a distributed algorithm, both in terms of computational resources as well as coding, debugging and deployment effort. Secondly, we propose a multithreading out-of-core approach to face the challenges of computing KNN on data that changes rapidly and continuously over time. After a thorough evaluation, we observe that our main contributions address the challenges of computing the KNN on large datasets, leveraging the restricted resources of a single machine, decreasing runtimes compared to that of the baselines, and scaling the computation both on static and dynamic datasets. K-Plus proches voisins Performance des algorithmes Out-Of-Core Seul ordinateur K-Nearest Neighbors Scalability, Algorithm's Performance Out-Of-Core Single machine
16	PREDICTION OF PUBLIC BUS TRANSPORTATION PLANNING BASED ON PASSENGER COUNT AND TRAFFIC CONDITIONS Heidaripak, Samrend January 2021 (has links) Artificial intelligence has become a hot topic in the past couple of years because of its potential of solving problems. The most used subset of artificial intelligence today is machine learning, which is essentially the way a machine can learn to do tasks without getting any explicit instructions. A problem that has historically been solved by common knowledge and experience is the planning of bus transportation, which has been prone to mistakes. This thesis investigates how to extract the key features of a raw dataset and if a couple of machine learning algorithms can be applied to predict and plan the public bus transportation, while also considering the weather conditions. By using a pre-processing method to extract the features before creating and evaluating an k-nearest neighbors model as well as an artificial neural network model, predicting the passenger count on a given route could help planning of the bus transportation. The outcome of the thesis was that the feature extraction was successful, and both models could successfully predict the passenger count based on normal conditions. However, in extreme conditions such as the pandemic during 2020, the models could not be proven to successfully predict the passenger count nor being used to plan the bus transportation. Artificial neural network K-nearest neighbors Machine learning Feature selection Public bus transportation Computer Sciences Datavetenskap (datalogi)
17	Efficient Algorithms for Data Mining with Federated Databases Young, Barrington R. St. A. 03 July 2007 (has links) No description available. Computer Science Federated database Vertical partition Arbitrary attribute overlap Covariance Matrix k-Nearest Neighbors Euclidean distance Cluster
18	Predicting basketball performance based on draft pick : A classification analysis Harmén, Fredrik January 2022 (has links) In this thesis, we will look to predict the performance of a basketball player coming into the NBA depending on where the player was picked in the NBA draft. This will be done by testing different machine learning models on data from the previous 35 NBA drafts and then comparing the models in order to see which model had the highest accuracy of classification. The machine learning methods used are Linear Discriminant Analysis, K-Nearest Neighbors, Support Vector Machines and Random Forests. The results show that the method with the highest accuracy of classification was Random Forests, with an accuracy of 42%. machine learning linear discriminant analysis k-nearest neighbors support vector machines random forests Probability Theory and Statistics Sannolikhetsteori och statistik
19	Aplicação de classificadores para determinação de conformidade de biodiesel / Attesting compliance of biodiesel quality using classification methods LOPES, Marcus Vinicius de Sousa 26 July 2017 (has links) Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-09-04T17:47:07Z No. of bitstreams: 1 MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5) / Made available in DSpace on 2017-09-04T17:47:07Z (GMT). No. of bitstreams: 1 MarcusLopes.pdf: 2085041 bytes, checksum: 14f6f9bbe0d5b050a23103874af8c783 (MD5) Previous issue date: 2017-07-26 / The growing demand for energy and the limitations of oil reserves have led to the search for renewable and sustainable energy sources to replace, even partially, fossil fuels. Biodiesel has become in last decades the main alternative to petroleum diesel. Its quality is evaluated by given parameters and specifications which vary according to country or region like, for example, in Europe (EN 14214), US (ASTM D6751) and Brazil (RANP 45/2014), among others. Some of these parameters are intrinsically related to the composition of fatty acid methyl esters (FAMEs) of biodiesel, such as viscosity, density, oxidative stability and iodine value, which allows to relate the behavior of these properties with the size of the carbon chain and the presence of unsaturation in the molecules. In the present work four methods for direct classification (support vector machine, K-nearest neighbors, decision tree classifier and artificial neural networks) were optimized and compared to classify biodiesel samples according to their compliance to viscosity, density, oxidative stability and iodine value, having as input the composition of fatty acid methyl esters, since those parameters are intrinsically related to composition of biodiesel. The classifi- cations were carried out under the specifications of standards EN 14214, ASTM D6751 and RANP 45/2014. A comparison between these methods of direct classification and empirical equations (indirect classification) distinguished positively the direct classification methods in the problem addressed, especially when the biodiesel samples have properties values very close to the limits of the considered specifications. / A demanda crescente por fontes de energia renováveis e como alternativa aos combustíveis fósseis tornam o biodiesel como uma das principais alternativas para substituição dos derivados do petróleo. O controle da qualidade do biodiesel durante processo de produção e distribuição é extremamente importante para garantir um combustível com qualidade confiável e com desempenho satisfatório para o usuário final. O biodiesel é caracterizado pela medição de determinadas propriedades de acordo com normas internacionais. A utilização de métodos de aprendizagem de máquina para a caracterização do biodiesel permite economia de tempo e dinheiro. Neste trabalho é mostrado que para a determinação da conformidade de um biodiesel os classificadores SVM, KNN e Árvore de decisões apresentam melhores resultados que os métodos de predição de trabalhos anteriores. Para as propriedades de viscosidade densidade, índice de iodo e estabilidade oxidativa (RANP 45/2014, EN14214:2014 e ASTM D6751-15) os classificadores KNN e Árvore de decisões apresentaram-se como melhores opções. Estes resultados mostram que os classificadores podem ser aplicados de forma prática visando economia de tempo, recursos financeiros e humanos. Biodiesel Parâmetros de qualidade Máquina de vetor de suporte K-vizinhos próximos Árvore de Decisões Quality Parameters Support Vector Machine K-Nearest Neighbors Decision Tree Classifier Sistemas de Informação
20	Construction of the Intensity-Duration-Frequency (IDF) Curves under Climate Change 2014 December 1900 (has links) Intensity-Duration-Frequency (IDF) curves are among the standard design tools for various engineering applications, such as storm water management systems. The current practice is to use IDF curves based on historical extreme precipitation quantiles. A warming climate, however, might change the extreme precipitation quantiles represented by the IDF curves, emphasizing the need for updating the IDF curves used for the design of urban storm water management systems in different parts of the world, including Canada. This study attempts to construct the future IDF curves for Saskatoon, Canada, under possible climate change scenarios. For this purpose, LARS-WG, a stochastic weather generator, is used to spatially downscale the daily precipitation projected by Global Climate Models (GCMs) from coarse grid resolution to the local point scale. The stochastically downscaled daily precipitation realizations were further disaggregated into ensemble hourly and sub-hourly (as fine as 5-minute) precipitation series, using a disaggregation scheme developed using the K-nearest neighbor (K-NN) technique. This two-stage modeling framework (downscaling to daily, then disaggregating to finer resolutions) is applied to construct the future IDF curves in the city of Saskatoon. The sensitivity of the K-NN disaggregation model to the number of nearest neighbors (i.e. window size) is evaluated during the baseline period (1961-1990). The optimal window size is assigned based on the performance in reproducing the historical IDF curves by the K-NN disaggregation models. Two optimal window sizes are selected for the K-NN hourly and sub-hourly disaggregation models that would be appropriate for the hydrological system of Saskatoon. By using the simulated hourly and sub-hourly precipitation series and the Generalized Extreme Value (GEV) distribution, future changes in the IDF curves and associated uncertainties are quantified using a large ensemble of projections obtained for the Canadian and British GCMs (CanESM2 and HadGEM2-ES) based on three Representative Concentration Pathways; RCP2.6, RCP4.5, and RCP8.5 available from CMIP5 – the most recent product of the Intergovernmental Panel on Climate Change (IPCC). The constructed IDF curves are then compared with the ones constructed using another method based on a genetic programming technique. The results show that the sign and the magnitude of future variations in extreme precipitation quantiles are sensitive to the selection of GCMs and/or RCPs, and the variations seem to become intensified towards the end of the 21st century. Generally, the relative change in precipitation intensities with respect to the historical intensities for CMIP5 climate models (e.g., CanESM2: RCP4.5) is less than those for CMIP3 climate models (e.g., CGCM3.1: B1), which may be due to the inclusion of climate policies (i.e., adaptation and mitigation) in CMIP5 climate models. The two-stage downscaling-disaggregation method enables quantification of uncertainty due to natural internal variability of precipitation, various GCMs and RCPs, and downscaling methods. In general, uncertainty in the projections of future extreme precipitation quantiles increases for short durations and for long return periods. The two-stage method adopted in this study and the GP method reconstruct the historical IDF curves quite successfully during the baseline period (1961-1990); this suggests that these methods can be applied to efficiently construct IDF curves at the local scale under future climate scenarios. The most notable precipitation intensification in Saskatoon is projected to occur with shorter storm duration, up to one hour, and longer return periods of more than 25 years. K-nearest neighbors (K-NN) Saskatoon.

Search results