Spelling suggestions: "subject:"K nearest neighbor""
41 |
Detekce přítomnosti osob pomocí IoT senzorů / Room Occupancy Detection with IoT SensorsKolarčík, Tomáš January 2021 (has links)
The aim of this work was to create a module for home automation tools Home Assistant. The module is able to determine which room is inhabited and estimate more accurate position of people inside the room. Known GPS location cannot be used for this purpose because it is inaccurate inside buildings and therefore one of the indoor location techniques needs to be used. Solution based on Bluetooth Low Energy wireless technology was chosen. The localization technique is the fingerprinting method, which is based on estimating the position according to the signal strength at any point in space, which are compared with a database of these points using machine learning. The system can be supplemented with motion sensors that ensure a quick response when entering the room. This system can be deployed within a house, apartment or small to medium-sized company to determine the position of people in the building and can serve as a very powerful element of home automation.
|
42 |
Kombination von terrestrischen Aufnahmen und Fernerkundungsdaten mit Hilfe der kNN-Methode zur Klassifizierung und Kartierung von WäldernStümer, Wolfgang 24 August 2004 (has links)
Bezüglich des Waldes hat sich in den letzten Jahren seitens der Politik und Wirtschaft ein steigender Informationsbedarf entwickelt. Zur Bereitstellung dieses Bedarfes stellt die Fernerkundung ein wichtiges Hilfsmittel dar, mit dem sich flächendeckende Datengrundlagen erstellen lassen. Die k-nächsten-Nachbarn-Methode (kNN-Methode), die terrestrische Aufnahmen mit Fernerkundungsdaten kombiniert, stellt eine Möglichkeit dar, diese Datengrundlage mit Hilfe der Fernerkundung zu verwirklichen. Deshalb beschäftigt sich die vorliegende Dissertation eingehend mit der kNN-Methode. An Hand der zwei Merkmale Grundfläche (metrische Daten) und Totholz (kategoriale Daten) wurden umfangreiche Berechnungen durchgeführt, wobei verschiedenste Variationen der kNN-Methode berücksichtigt wurden. Diese Variationen umfassen verschiedenste Einstellungen der Distanzfunktion, der Wichtungsfunktion und der Anzahl k-nächsten Nachbarn. Als Fernerkundungsdatenquellen kamen Landsat- und Hyperspektraldaten zum Einsatz, die sich sowohl von ihrer spektralen wie auch ihrer räumlichen Auflösung unterscheiden. Mit Hilfe von Landsat-Szenen eines Gebietes von verschiedenen Zeitpunkten wurde außerdem der multitemporale Ansatz berücksichtigt. Die terrestrische Datengrundlage setzt sich aus Feldaufnahmen mit verschiedenen Aufnahmedesigns zusammen, wobei ein wichtiges Kriterium die gleichmäßige Verteilung von Merkmalswerten (z.B. Grundflächenwerten) über den Merkmalsraum darstellt. Für die Durchführung der Berechnungen wurde ein Programm mit Visual Basic programmiert, welches mit der Integrierung aller Funktionen auf der Programmoberfläche eine benutzerfreundliche Bedienung ermöglicht. Die pixelweise Ausgabe der Ergebnisse mündete in detaillierte Karten und die Verifizierung der Ergebnisse wurde mit Hilfe des prozentualen Root Mean Square Error und der Bootstrap-Methode durchgeführt. Die erzielten Genauigkeiten für das Merkmal Grundfläche liegen zwischen 35 % und 67 % (Landsat) bzw. zwischen 65 % und 67 % (HyMapTM). Für das Merkmal Totholz liegen die Übereinstimmungen zwischen den kNN-Schätzern und den Referenzwerten zwischen 60,0 % und 73,3 % (Landsat) und zwischen 60,0 % und 63,3 % (HyMapTM). Mit den erreichten Genauigkeiten bietet sich die kNN-Methode für die Klassifizierung von Beständen bzw. für die Integrierung in Klassifizierungsverfahren an. / Mapping forest variables and associated characteristics is fundamental for forest planning and management. The following work describes the k-nearest neighbors (kNN) method for improving estimations and to produce maps for the attributes basal area (metric data) and deadwood (categorical data). Several variations within the kNN-method were tested, including: distance metric, weighting function and number of neighbors. As sources of remote sensing Landsat TM satellite images and hyper spectral data were used, which differ both from their spectral as well as their spatial resolutions. Two Landsat scenes from the same area acquired September 1999 and 2000 regard multiple approaches. The field data for the kNN- method comprise tree field measurements which were collected from the test site Tharandter Wald (Germany). The three field data collections are characterized by three different designs. For the kNN calculation a program with integration all kNN functions were developed. The relative root mean square errors (RMSE) and the Bootstrap method were evaluated in order to find optimal parameters. The estimation accuracy for the attribute basal area is between 35 % and 67 % (Landsat) and 65 % and 67 % (HyMapTM). For the attribute deadwood is the accuracy between 60 % and 73 % (Landsat) and 60 % and 63 % (HyMapTM). Recommendations for applying the kNN method for mapping and regional estimation are provided.
|
43 |
Data Driven Energy Efficiency of ShipsTaspinar, Tarik January 2022 (has links)
Decreasing the fuel consumption and thus greenhouse gas emissions of vessels has emerged as a critical topic for both ship operators and policy makers in recent years. The speed of vessels has long been recognized to have highest impact on fuel consumption. The solution suggestions like "speed optimization" and "speed reduction" are ongoing discussion topics for International Maritime Organization. The aim of this study are to develop a speed optimization model using time-constrained genetic algorithms (GA). Subsequent to this, this paper also presents the application of machine learning (ML) regression methods in setting up a model with the aim of predicting the fuel consumption of vessels. Local outlier factor algorithm is used to eliminate outlier in prediction features. In boosting and tree-based regression prediction methods, the overfitting problem is observed after hyperparameter tuning. Early stopping technique is applied for overfitted models.In this study, speed is also found as the most important feature for fuel consumption prediction models. On the other hand, GA evaluation results showed that random modifications in default speed profile can increase GA performance and thus fuel savings more than constant speed limits during voyages. The results of GA also indicate that using high crossover rates and low mutations rates can increase fuel saving.Further research is recommended to include fuel and bunker prices to determine more accurate fuel efficiency.
|
44 |
Neural Networks for Modeling of Electrical Parameters and Losses in Electric VehicleFujimoto, Yo January 2023 (has links)
Permanent magnet synchronous machines have various advantages and have showed the most superiorperformance for Electric Vehicles. However, modeling them is difficult because of their nonlinearity. In orderto deal with the complexity, the artificial neural network and machine learning models including k-nearest neighbors, decision tree, random forest, and multiple linear regression with a quadratic model are developed to predict electrical parameters and losses as new prediction approaches for the performance of Volvo Cars’ electric vehicles and evaluate their performance. The test operation data of the Volvo Cars Corporation was used to extract and calculate the input and output data for each prediction model. In order to smooth the effects of each input variable, the input data was normalized. In addition, correlation matrices of normalized inputs were produced, which showed a high correlation between rotor temperature and winding resistance in the electrical parameter prediction dataset. They also demonstrated a strong correlation between the winding temperature and the rotor temperature in the loss prediction dataset.Grid search with 5-fold cross validation was implemented to optimize hyperparameters of artificial neuralnetwork and machine learning models. The artificial neural network models performed the best in MSE and R-squared scores for all the electrical parameters and loss prediction. The results indicate that artificial neural networks are more successful at handling complicated nonlinear relationships like those seen in electrical systems compared with other machine learning algorithms. Compared to other machine learning algorithms like decision trees, k-nearest neighbors, and multiple linear regression with a quadratic model, random forest produced superior results. With the exception of q-axis voltage, the decision tree model outperformed the knearestneighbors model in terms of parameter prediction, as measured by MSE and R-squared score. Multiple linear regression with a quadratic model produced the worst results for the electric parameters prediction because the relationship between the input and output was too complex for a multiple quadratic equation to deal with. Random forest models performed better than decision tree models because random forest ensemblehundreds of subset of decision trees and averaging the results. The k-nearest neighbors performed worse for almost all electrical parameters anticipation than the decision tree because it simply chooses the closest points and uses the average as the projected outputs so it was challenging to forecast complex nonlinear relationships. However, it is helpful for handling simple relationships and for understanding relationships in data. In terms of loss prediction, the k-nearest neighbors and decision tree produced similar results in MSE and R-squared score for the electric machine loss and the inverter loss. Their prediction results were worse than the multiple linear regression with a quadratic model, but they performed better than the multiple linear regression with a quadratic model, for forecasting the power difference between electromagnetic power and mechanical power.
|
45 |
Chemical Analysis, Databasing, and Statistical Analysis of Smokeless Powders for Forensic ApplicationDennis, Dana-Marie 01 January 2015 (has links)
Smokeless powders are a set of energetic materials, known as low explosives, which are typically utilized for reloading ammunition. There are three types which differ in their primary energetic materials; where single base powders contain nitrocellulose as their primary energetic material, double and triple base powders contain nitroglycerin in addition to nitrocellulose, and triple base powders also contain nitroguanidine. Additional organic compounds, while not proprietary to specific manufacturers, are added to the powders in varied ratios during the manufacturing process to optimize the ballistic performance of the powders. The additional compounds function as stabilizers, plasticizers, flash suppressants, deterrents, and opacifiers. Of the three smokeless powder types, single and double base powders are commercially available, and have been heavily utilized in the manufacture of improvised explosive devices. Forensic smokeless powder samples are currently analyzed using multiple analytical techniques. Combined microscopic, macroscopic, and instrumental techniques are used to evaluate the sample, and the information obtained is used to generate a list of potential distributors. Gas chromatography – mass spectrometry (GC-MS) is arguably the most useful of the instrumental techniques since it distinguishes single and double base powders, and provides additional information about the relative ratios of all the analytes present in the sample. However, forensic smokeless powder samples are still limited to being classified as either single or double base powders, based on the absence or presence of nitroglycerin, respectively. In this work, the goal was to develop statistically valid classes, beyond the single and double base designations, based on multiple organic compounds which are commonly encountered in commercial smokeless powders. Several chemometric techniques were applied to smokeless powder GC-MS data for determination of the classes, and for assignment of test samples to these novel classes. The total ion spectrum (TIS), which is calculated from the GC-MS data for each sample, is obtained by summing the intensities for each mass-to-charge (m/z) ratio across the entire chromatographic profile. A TIS matrix comprising data for 726 smokeless powder samples was subject to agglomerative hierarchical cluster (AHC) analysis, and six distinct classes were identified. Within each class, a single m/z ratio had the highest intensity for the majority of samples, though the m/z ratio was not always unique to the specific class. Based on these observations, a new classification method known as the Intense Ion Rule (IIR) was developed and used for the assignment of test samples to the AHC designated classes. Discriminant models were developed for assignment of test samples to the AHC designated classes using k-Nearest Neighbors (kNN) and linear and quadratic discriminant analyses (LDA and QDA, respectively). Each of the models were optimized using leave-one-out (LOO) and leave-group-out (LGO) cross-validation, and the performance of the models was evaluated by calculating correct classification rates for assignment of the cross-validation (CV) samples to the AHC designated classes. The optimized models were utilized to assign test samples to the AHC designated classes. Overall, the QDA LGO model achieved the highest correct classification rates for assignment of both the CV samples and the test samples to the AHC designated classes. In forensic application, the goal of an explosives analyst is to ascertain the manufacturer of a smokeless powder sample. In addition, knowledge about the probability of a forensic sample being produced by a specific manufacturer could potentially decrease the time invested by an analyst during investigation by providing a shorter list of potential manufacturers. In this work, Bayes* Theorem and Bayesian Networks were investigated as an additional tool to be utilized in forensic casework. Bayesian Networks were generated and used to calculate posterior probabilities of a test sample belonging to specific manufacturers. The networks were designed to include manufacturer controlled powder characteristics such as shape, color, and dimension; as well as, the relative intensities of the class associated ions determined from cluster analysis. Samples were predicted to belong to a manufacturer based on the highest posterior probability. Overall percent correct rates were determined by calculating the percentage of correct predictions; that is, where the known and predicted manufacturer were the same. The initial overall percent correct rate was 66%. The dimensions of the smokeless powders were added to the network as average diameter and average length nodes. Addition of average diameter and length resulted in an overall prediction rate of 70%.
|
46 |
Валидация модели машинного обучения для прогнозирования магнитных свойств нанокристаллических сплавов типа FINEMET : магистерская диссертация / Validation of machine learning model to predict magnetic properties of nanocrystalline FINEMET type alloysСтепанова, К. А., Stepanova, K. A. January 2022 (has links)
В работе была произведена разработка модели машинного обучения на языке программирования Python, а также проведена ее валидация на этапах жизненного цикла. Целью создания модели машинного обучения является прогнозирование магнитных свойств нанокристаллических сплавов на основе железа по химическому составу и условиям обработки. Процесс валидации модели машинного обучения позволяет не только произвести контроль за соблюдением требований, предъявляемых при разработке и эксплуатации модели, к результатам, полученных с помощью моделирования, но и способствует внедрению модели в процесс производства. Процесс валидации включал в себя валидацию данных, в ходе которой были оценены типы, пропуски данных, соответствие цели исследования, распределения признаков и целевых характеристик, изучены корреляции признаков и целевых характеристик; валидацию алгоритмов, применяемых в модели: были проанализированы параметры алгоритмов с целью соблюдения требования о корректной обобщающей способности модели (отсутствие недо- и переобучения); оценку работы модели, благодаря которой был произведен анализ полученных результатов с помощью тестовых данных; верификацию результатов с помощью актуальных данных, полученных из статей, опубликованных с 2010 по 2022 год. В результате валидации модели было показано высокое качество разработанной модели, позволяющее получить оценки качества R2 0,65 и выше. / In this work machine learning model was developed by Python programming language, and also was validated at stages of model’s life cycle. The purpose of creating the machine learning model is to predict the magnetic properties of Fe-based nanocrystalline alloys by chemical composition and processing conditions. The validation of machine learning models allows not only to control the requirements for development and operation of the models, for the results obtained by modeling, but also contrib¬utes to the introduction of the model into production process. The validation process included: data validation: data types and omissions, compliance with the purpose of the study, dis¬tribution of features and target characteristics were evaluated, correlations of features and target characteristics were studied; flgorithms validation: the parameters of the algorithms were analyzed in order to comply with the requirement for the correct generalizing ability of the model (without under- and overfit¬ting); evaluation of the model work: the analysis of the obtained results was carried out using test data; verification of results using actual data obtained from articles published since 2010 to 2022. As a result of the model validation, the high quality of the developed model was shown, which makes it possible to obtain quality metric R2 0.65 and higher.
|
47 |
Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics SwedenUnnebäck, Tea January 2023 (has links)
The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model.
|
48 |
Analyzing the Need for Nonprofits in the Housing Sector: A Predictive Model Based on LocationOerther, Catie 03 August 2023 (has links)
No description available.
|
49 |
Travel time estimation in congested urban networks using point detectors dataMahmoud, Anas Mohammad 02 May 2009 (has links)
A model for estimating travel time on short arterial links of congested urban networks, using currently available technology, is introduced in this thesis. The objective is to estimate travel time, with an acceptable level of accuracy for real-life traffic problems, such as congestion management and emergency evacuation. To achieve this research objective, various travel time estimation methods, including highway trajectories, multiple linear regression (MLR), artificial neural networks (ANN) and K –nearest neighbor (K-NN) were applied and tested on the same dataset. The results demonstrate that ANN and K-NN methods outperform linear methods by a significant margin, also, show particularly good performance in detecting congested intervals. To ensure the quality of the analysis results, set of procedures and algorithms based on traffic flow theory and test field information, were introduced to validate and clean the data used to build, train and test the different models.
|
50 |
Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning TechniquesAljifri, Ahmed January 2024 (has links)
This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms.
|
Page generated in 0.0609 seconds