Spelling suggestions: "subject:"nearest neighbors"" "subject:"nearest ineighbors""
71 |
Chemical Analysis, Databasing, and Statistical Analysis of Smokeless Powders for Forensic ApplicationDennis, Dana-Marie 01 January 2015 (has links)
Smokeless powders are a set of energetic materials, known as low explosives, which are typically utilized for reloading ammunition. There are three types which differ in their primary energetic materials; where single base powders contain nitrocellulose as their primary energetic material, double and triple base powders contain nitroglycerin in addition to nitrocellulose, and triple base powders also contain nitroguanidine. Additional organic compounds, while not proprietary to specific manufacturers, are added to the powders in varied ratios during the manufacturing process to optimize the ballistic performance of the powders. The additional compounds function as stabilizers, plasticizers, flash suppressants, deterrents, and opacifiers. Of the three smokeless powder types, single and double base powders are commercially available, and have been heavily utilized in the manufacture of improvised explosive devices. Forensic smokeless powder samples are currently analyzed using multiple analytical techniques. Combined microscopic, macroscopic, and instrumental techniques are used to evaluate the sample, and the information obtained is used to generate a list of potential distributors. Gas chromatography – mass spectrometry (GC-MS) is arguably the most useful of the instrumental techniques since it distinguishes single and double base powders, and provides additional information about the relative ratios of all the analytes present in the sample. However, forensic smokeless powder samples are still limited to being classified as either single or double base powders, based on the absence or presence of nitroglycerin, respectively. In this work, the goal was to develop statistically valid classes, beyond the single and double base designations, based on multiple organic compounds which are commonly encountered in commercial smokeless powders. Several chemometric techniques were applied to smokeless powder GC-MS data for determination of the classes, and for assignment of test samples to these novel classes. The total ion spectrum (TIS), which is calculated from the GC-MS data for each sample, is obtained by summing the intensities for each mass-to-charge (m/z) ratio across the entire chromatographic profile. A TIS matrix comprising data for 726 smokeless powder samples was subject to agglomerative hierarchical cluster (AHC) analysis, and six distinct classes were identified. Within each class, a single m/z ratio had the highest intensity for the majority of samples, though the m/z ratio was not always unique to the specific class. Based on these observations, a new classification method known as the Intense Ion Rule (IIR) was developed and used for the assignment of test samples to the AHC designated classes. Discriminant models were developed for assignment of test samples to the AHC designated classes using k-Nearest Neighbors (kNN) and linear and quadratic discriminant analyses (LDA and QDA, respectively). Each of the models were optimized using leave-one-out (LOO) and leave-group-out (LGO) cross-validation, and the performance of the models was evaluated by calculating correct classification rates for assignment of the cross-validation (CV) samples to the AHC designated classes. The optimized models were utilized to assign test samples to the AHC designated classes. Overall, the QDA LGO model achieved the highest correct classification rates for assignment of both the CV samples and the test samples to the AHC designated classes. In forensic application, the goal of an explosives analyst is to ascertain the manufacturer of a smokeless powder sample. In addition, knowledge about the probability of a forensic sample being produced by a specific manufacturer could potentially decrease the time invested by an analyst during investigation by providing a shorter list of potential manufacturers. In this work, Bayes* Theorem and Bayesian Networks were investigated as an additional tool to be utilized in forensic casework. Bayesian Networks were generated and used to calculate posterior probabilities of a test sample belonging to specific manufacturers. The networks were designed to include manufacturer controlled powder characteristics such as shape, color, and dimension; as well as, the relative intensities of the class associated ions determined from cluster analysis. Samples were predicted to belong to a manufacturer based on the highest posterior probability. Overall percent correct rates were determined by calculating the percentage of correct predictions; that is, where the known and predicted manufacturer were the same. The initial overall percent correct rate was 66%. The dimensions of the smokeless powders were added to the network as average diameter and average length nodes. Addition of average diameter and length resulted in an overall prediction rate of 70%.
|
72 |
Валидация модели машинного обучения для прогнозирования магнитных свойств нанокристаллических сплавов типа FINEMET : магистерская диссертация / Validation of machine learning model to predict magnetic properties of nanocrystalline FINEMET type alloysСтепанова, К. А., Stepanova, K. A. January 2022 (has links)
В работе была произведена разработка модели машинного обучения на языке программирования Python, а также проведена ее валидация на этапах жизненного цикла. Целью создания модели машинного обучения является прогнозирование магнитных свойств нанокристаллических сплавов на основе железа по химическому составу и условиям обработки. Процесс валидации модели машинного обучения позволяет не только произвести контроль за соблюдением требований, предъявляемых при разработке и эксплуатации модели, к результатам, полученных с помощью моделирования, но и способствует внедрению модели в процесс производства. Процесс валидации включал в себя валидацию данных, в ходе которой были оценены типы, пропуски данных, соответствие цели исследования, распределения признаков и целевых характеристик, изучены корреляции признаков и целевых характеристик; валидацию алгоритмов, применяемых в модели: были проанализированы параметры алгоритмов с целью соблюдения требования о корректной обобщающей способности модели (отсутствие недо- и переобучения); оценку работы модели, благодаря которой был произведен анализ полученных результатов с помощью тестовых данных; верификацию результатов с помощью актуальных данных, полученных из статей, опубликованных с 2010 по 2022 год. В результате валидации модели было показано высокое качество разработанной модели, позволяющее получить оценки качества R2 0,65 и выше. / In this work machine learning model was developed by Python programming language, and also was validated at stages of model’s life cycle. The purpose of creating the machine learning model is to predict the magnetic properties of Fe-based nanocrystalline alloys by chemical composition and processing conditions. The validation of machine learning models allows not only to control the requirements for development and operation of the models, for the results obtained by modeling, but also contrib¬utes to the introduction of the model into production process. The validation process included: data validation: data types and omissions, compliance with the purpose of the study, dis¬tribution of features and target characteristics were evaluated, correlations of features and target characteristics were studied; flgorithms validation: the parameters of the algorithms were analyzed in order to comply with the requirement for the correct generalizing ability of the model (without under- and overfit¬ting); evaluation of the model work: the analysis of the obtained results was carried out using test data; verification of results using actual data obtained from articles published since 2010 to 2022. As a result of the model validation, the high quality of the developed model was shown, which makes it possible to obtain quality metric R2 0.65 and higher.
|
73 |
Toward an application of machine learning for predicting foreign trade in services – a pilot study for Statistics SwedenUnnebäck, Tea January 2023 (has links)
The objective of this thesis is to investigate the possibility of using machine learn- ing at Statistics Sweden within the Foreign Trade in Services (FTS) statistic, to predict the likelihood of a unit to conduct foreign trade in services. The FTS survey is a sample survey, for which there is no natural frame to sample from. Therefore, prior to sampling a frame is manually constructed each year, starting with a register of all Swedish companies and agencies and in a rule- based manner narrowing it down to contain only what is classified as units likely to trade in services during the year to come. An automatic procedure that would enable reliable predictions is requested. To this end, three different machine learning methods have been analyzed, two rule- based methods (random forest and extreme gradient boosting) and one distance- based method (k nearest neighbors). The models arising from these methods are trained and tested on historically sampled units, for which it is known whether they did trade or not. The results indicate that the two rule-based methods perform well in classifying likely traders. The random forest model is better at finding traders, while the extreme gradient boosting model is better at finding non-traders. The results also indicate interesting patterns when studying different metrics for the models. The results also indicate that when training the rule-based models, the year in which the training data was sampled needs to be taken into account. This entails that cross-validation with random folds should not be used, but rather grouped cross-validation based on year. By including a feature that mirror the state of the economy, the model can adapt its rules to this, meaning that the rules learned on training data can be extended to years beyond training data. Based on the observed results, the final recommendation is to further develop and investigate the performance of the random forest model.
|
74 |
Analyzing the Need for Nonprofits in the Housing Sector: A Predictive Model Based on LocationOerther, Catie 03 August 2023 (has links)
No description available.
|
75 |
Travel time estimation in congested urban networks using point detectors dataMahmoud, Anas Mohammad 02 May 2009 (has links)
A model for estimating travel time on short arterial links of congested urban networks, using currently available technology, is introduced in this thesis. The objective is to estimate travel time, with an acceptable level of accuracy for real-life traffic problems, such as congestion management and emergency evacuation. To achieve this research objective, various travel time estimation methods, including highway trajectories, multiple linear regression (MLR), artificial neural networks (ANN) and K –nearest neighbor (K-NN) were applied and tested on the same dataset. The results demonstrate that ANN and K-NN methods outperform linear methods by a significant margin, also, show particularly good performance in detecting congested intervals. To ensure the quality of the analysis results, set of procedures and algorithms based on traffic flow theory and test field information, were introduced to validate and clean the data used to build, train and test the different models.
|
76 |
Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning TechniquesAljifri, Ahmed January 2024 (has links)
This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms.
|
77 |
Estudo do campo cristalino em óxidos contendo íons európioSantana, Pedro Jonathan Santos 01 March 2013 (has links)
In this work the Point Charge Electrostatic Model (PCEM), the Simple Overlap Model (SOM) and the Method of Equivalents Nearest Neighbors (MENN) were applied to a well known series of oxides, namely, Gd2O3, Y2O3, Lu2O3, In2O3 and Sc2O3, all doped ion Eu3+ with the purpose of discussing the charge of interaction and some aspects of the crystal field effect. To this end, calculations were made of the crystal field and crystal field strength parameters and splitting of the 7F1 level of the luminescent ion. By using the luminescent site local structure, the PCEM, as expected, led to satisfactory results only from the qualitative point of view. With the SOM and the MENN it was possible to reproduce the experimental splitting of the 7F1 energy level and its sublevels, with physically acceptable charge factors. Only in some cases the NN charge has been greater than its valence. A discussion on the position of the charge of interaction also has been made. / Neste trabalho o Modelo Eletrostático de Cargas Pontuais, o Modelo de Recobrimento Simples e o Método dos Vizinhos Equivalentes foram aplicados a uma série bem conhecida de óxidos, a saber, Gd2O3, Y2O3, Lu2O3, In2O3 e Sc2O3, todos dopados com o íon Eu3+, com o objetivo de discutir a carga de interação e aspectos do efeito do campo cristalino. Para isso, foram feitos cálculos de parâmetros do campo cristalino, de parâmetro de força do campo cristalino, dos subníveis e desdobramento do nível 7F1 do íon luminescente. Usando a estrutura local do sitio luminescente, o modelo eletrostático, como esperado, levou a resultados satisfatórios apenas do ponto de vista qualitativo. Já com o modelo de recobrimento simples e com o método dos vizinhos equivalentes foi possível reproduzir o desdobramento experimental do nível de energia 7F1 e os seus subníveis, com fatores de carga fisicamente plausíveis. Em apenas alguns poucos casos a carga dos primeiros vizinhos foi maior que a valência respectiva. Também está feita uma discussão sobre a possível posição desta carga de interação.
|
78 |
Price Prediction of Vinyl Records Using Machine Learning AlgorithmsJohansson, David January 2020 (has links)
Machine learning algorithms have been used for price prediction within several application areas. Examples include real estate, the stock market, tourist accommodation, electricity, art, cryptocurrencies, and fine wine. Common approaches in studies are to evaluate the accuracy of predictions and compare different algorithms, such as Linear Regression or Neural Networks. There is a thriving global second-hand market for vinyl records, but the research of price prediction within the area is very limited. The purpose of this project was to expand on existing knowledge within price prediction in general to evaluate some aspects of price prediction of vinyl records. That included investigating the possible level of accuracy and comparing the efficiency of algorithms. A dataset of 37000 samples of vinyl records was created with data from the Discogs website, and multiple machine learning algorithms were utilized in a controlled experiment. Among the conclusions drawn from the results was that the Random Forest algorithm generally generated the strongest results, that results can vary substantially between different artists or genres, and that a large part of the predictions had a good accuracy level, but that a relatively small amount of large errors had a considerable effect on the general results.
|
79 |
Využití umělé inteligence v technické diagnostice / Utilization of artificial intelligence in technical diagnosticsKonečný, Antonín January 2021 (has links)
The diploma thesis is focused on the use of artificial intelligence methods for evaluating the fault condition of machinery. The evaluated data are from a vibrodiagnostic model for simulation of static and dynamic unbalances. The machine learning methods are applied, specifically supervised learning. The thesis describes the Spyder software environment, its alternatives, and the Python programming language, in which the scripts are written. It contains an overview with a description of the libraries (Scikit-learn, SciPy, Pandas ...) and methods — K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees (DT) and Random Forests Classifiers (RF). The results of the classification are visualized in the confusion matrix for each method. The appendix includes written scripts for feature engineering, hyperparameter tuning, evaluation of learning success and classification with visualization of the result.
|
80 |
Využití metod dolování dat pro analýzu sociálních sítí / Using of Data Mining Method for Analysis of Social NetworksNovosad, Andrej January 2013 (has links)
Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.
|
Page generated in 0.066 seconds