21 |
A GENE ONTOLOGY BASED COMPUTATIONAL APPROACH FOR THE PREDICTION OF PROTEIN FUNCTIONSKharsikar, Saket 13 September 2007 (has links)
No description available.
|
22 |
A Direct Algorithm for the K-Nearest-Neighbor Classifier via Local Warping of the Distance MetricNeo, TohKoon 30 November 2007 (has links) (PDF)
The k-nearest neighbor (k-NN) pattern classifier is a simple yet effective learner. However, it has a few drawbacks, one of which is the large model size. There are a number of algorithms that are able to condense the model size of the k-NN classifier at the expense of accuracy. Boosting is therefore desirable for increasing the accuracy of these condensed models. Unfortunately, there does not exist a boosting algorithm that works well with k-NN directly. We present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN.
|
23 |
Generating Exploration Mission-3 Trajectories to a 9:2 NRHO Using Machine LearningGuzman, Esteban 01 December 2018 (has links) (PDF)
The purpose of this thesis is to design a machine learning algorithm platform that provides expanded knowledge of mission availability through a launch season by improving trajectory resolution and introducing launch mission forecasting. The specific scenario addressed in this paper is one in which data is provided for four deterministic translational maneuvers through a mission to a Near Rectilinear Halo Orbit (NRHO) with a 9:2 synodic frequency. Current launch availability knowledge under NASA’s Orion Orbit Performance Team is established by altering optimization variables associated to given reference launch epochs. This current method can be an abstract task and relies on an orbit analyst to structure a mission based off an established mission design methodology associated to the performance of Orion and NASA's Space Launch System. Introducing a machine learning algorithm trained to construct mission scenarios within the feasible range of known trajectories reduces the required interaction of the orbit analyst by removing the needed step of optimizing the orbit to fit an expected translational response required of the spacecraft. In this study, k-Nearest Neighbor and Bayesian Linear Regression successfully predicted classical orbital elements for the launch windows observed. However both algorithms had limitations due to their approaches to model fitting. Training machine learning algorithms off of classical orbital elements introduced a repetitive approach to reconstructing mission segments for different arrival opportunities through the launch window and can prove to be a viable method of launch window scan generation for future missions.
|
24 |
Uncertainty Analysis : Severe Accident Scenario at a Nordic Nuclear Power PlantHedly, Josefin, De Young, Mikaela January 2023 (has links)
Nuclear Power Plants (NPP) undergo fault and sensitivity analysis with scenario modelling to predict catastrophic events, specifically releases of Cesium 137 (Cs-137). The purpose of this thesis is to find which of 108 input-features from Modular Accident Analysis Program (MAAP)simulation code are important, when there is large release of Cs-137 emissions. The features are tested all together and in their groupings. To find important features, the Machine learning (ML) model Random Forest (RF) has a built-in attribute which identifies important features. The results of RF model classification are corroborated with Support Vector Machines (SVM), K-Nearest Neighbor (KNN) and use k-folds cross validation to improve and validate the results, resulting in a near 90% accuracy for the three ML models. RF is successful at identifying important features related to Cs-137 emissions, by using the classification model to first identify top features, to further train the models at identifying important input-features. The discovered input-features are important both within their individual groups, but also when including all features simultaneously. The large number of features included did not disrupt RF much, but the skewed dataset with few classified extreme events caused the accuracy to be lower at near 90%.
|
25 |
Exploring the Noise Resilience of Combined Sturges AlgorithmAgarwal, Akrita January 2015 (has links)
No description available.
|
26 |
Machine Learning for Malware Detection in Network TrafficOmopintemi, A.H., Ghafir, Ibrahim, Eltanani, S., Kabir, Sohag, Lefoane, Moemedi 19 December 2023 (has links)
No / Developing advanced and efficient malware detection systems is
becoming significant in light of the growing threat landscape in cybersecurity. This work aims to tackle the enduring problem of identifying malware and protecting digital assets from cyber-attacks.
Conventional methods frequently prove ineffective in adjusting
to the ever-evolving field of harmful activity. As such, novel approaches that improve precision while simultaneously taking into
account the ever-changing landscape of modern cybersecurity problems are needed. To address this problem this research focuses on
the detection of malware in network traffic. This work proposes
a machine-learning-based approach for malware detection, with
particular attention to the Random Forest (RF), Support Vector Machine (SVM), and Adaboost algorithms. In this paper, the model’s
performance was evaluated using an assessment matrix. Included
the Accuracy (AC) for overall performance, Precision (PC) for positive predicted values, Recall Score (RS) for genuine positives, and
the F1 Score (SC) for a balanced viewpoint. A performance comparison has been performed and the results reveal that the built model
utilizing Adaboost has the best performance. The TPR for the three
classifiers performs over 97% and the FPR performs < 4% for each of
the classifiers. The created model in this paper has the potential to
help organizations or experts anticipate and handle malware. The
proposed model can be used to make forecasts and provide management solutions in the network’s everyday operational activities.
|
27 |
Machine Learning Algorithms to Predict Cost Account Codes in an ERP System : An Exploratory Case StudyWirdemo, Alexander January 2023 (has links)
This study aimed to investigate how Machine Learning (ML) algorithms can be used to predict the cost account code to be used when handling invoices in an Enterprise Resource Planning (ERP) system commonly found in the Swedish public sector. This implied testing which one of the tested algorithms that performs the best and what criteria that need to be met in order to perform the best. Previous studies on ML and its use in invoice classification have focused on either the accounts payable side or the accounts receivable side of the balance sheet. The studies have used a variety of methods, some not only involving common ML algorithms such as Random forest, Naïve Bayes, Decision tree, Support Vector Machine, Logistic regression, Neural network or k-nearest Neighbor but also other classifiers such as rule classifiers and naïve classifiers. The general conclusion from previous studies is that several algorithms can classify invoices with a satisfactory accuracy score and that Random forest, Naïve Bayes and Neural network have shown the most promising results. The study was performed as an exploratory case study. The case company was a small municipal community where the finance clerks handles received invoices through an ERP system. The accounting step of invoice handling involves selecting the proper cost account code before submitting the invoice for review and approval. The data used was invoice summaries holding the organization number, bankgiro, postgiro and account code used. The algorithms selected for the task were the supervised learning algorithms Random forest and Naïve Bayes and the instance-based algorithm k-Nearest Neighbor (k-NN). The findings indicated that ML could be used to predict which cost account code to be used by providing a pre-filled suggestion when the clerk opens the invoice. Among the algorithms tested, Random forest performed the best with 78% accuracy (Naïve Bayes and k-NN performed at 69% and 70% accuracy, respectively). One reason for this is Random forest’s ability to handle several input variables, generate an unbiased estimate of the generalization error, and its ability to give information about the relationship between the variables and classification. However, a high level of support is needed in order to get the algorithm to perform at its best, where 335 occurrences is a guiding number in this case. / Syftet med denna studie var att undersöka hur Machine Learning (ML) algoritmer kan användas för att förutsäga vilken kontokod som ska användas vid hantering av fakturor i ett affärssystem som är vanligt förekommande i svensk offentlig sektor. Detta innebar att undersöka vilken av de testade algoritmerna som presterar bäst och vilka kriterier som måste uppfyllas för att prestera bäst. Tidigare studier om ML och dess användning vid fakturaklassificering har fokuserat på antingen balansräkningens leverantörsreskontra (leverantörsskulder) eller kundreskontrasidan (kundfordringar) i balansräkningen. Studierna har använt olika metoder, några involverar inte bara vanliga ML-algoritmer som Random forest, Naive Bayes, beslutsträd, Support Vector Machine, Logistisk regression, Neuralt nätverk eller k-nearest Neighbour, utan även andra klassificerare som regelklassificerare och naiva klassificerare. Den generella slutsatsen från tidigare studier är att det finns flera algoritmer som kan klassificera fakturor med en tillfredsställande noggrannhet, och att Random forest, Naive Bayes och neurala nätverk har visat de mest lovande resultaten. Studien utfördes som en explorativ fallstudie. Fallföretaget var en mindre kommun där ekonomiassistenter hanterar inkommande fakturor genom ett affärssystem. Bokföringssteget för fakturahantering innebär att användaren väljer rätt kostnadskontokod innan fakturan skickas för granskning och godkännande. Uppgifterna som användes var fakturasammandrag med organisationsnummer, bankgiro, postgiro och kontokod. Algoritmerna som valdes för uppgiften var de övervakade inlärningsalgoritmerna Random forest och Naive Bayes och den instansbaserade algoritmen k-Nearest Neighbour. Resultaten tyder på att ML skulle kunna användas för att förutsäga vilken kostnadskod som ska användas genom att ge ett förifyllt förslag när expediten öppnar fakturan. Bland de testade algoritmerna presterade Random forest bäst med 78 % noggrannhet (Naïve Bayes och k-Nearest Neighbour presterade med 69 % respektive 70 % noggrannhet). En förklaring till detta är Random forests förmåga att hantera flera indatavariabler, generera en opartisk skattning av generaliseringsfelet och dess förmåga att ge information om sambandet mellan variablerna och klassificeringen. Det krävs dock en högt antal dataobservationer för att få algoritmen att prestera som bäst, där 335 förekomster är ett minimum i detta fall.
|
28 |
Artificial intelligence and Machine learning : a diabetic readmission studyForsman, Robin, Jönsson, Jimmy January 2019 (has links)
The maturing of Artificial intelligence provides great opportunities for healthcare, but also comes with new challenges. For Artificial intelligence to be adequate a comprehensive analysis of the data is necessary along with testing the data in multiple algorithms to determine which algorithm is appropriate to use. In this study collection of data has been gathered that consists of patients who have either been readmitted or not readmitted to hospital within 30-days after being admitted. The data has then been analyzed and compared in different algorithms to determine the most appropriate algorithm to use.
|
29 |
Classification Of Forest Areas By K Nearest Neighbor Method: Case Study, AntalyaOzsakabasi, Feray 01 June 2008 (has links) (PDF)
Among the various remote sensing methods that can be used to map forest areas, the K Nearest Neighbor (KNN) supervised classification method is becoming increasingly popular for creating forest inventories in some countries. In this study, the utility of the KNN algorithm is evaluated for forest/non-forest/water stratification. Antalya is selected as the study area. The data used are composed of Landsat TM and Landsat ETM satellite images, acquired in 1987 and 2002, respectively, SRTM 90 meters digital elevation model (DEM) and land use data from the year 2003. The accuracies of different modifications of the KNN algorithm are evaluated using Leave One Out, which is a special case of K-fold cross-validation, and traditional accuracy assessment using error matrices. The best parameters are found to be Euclidean distance metric, inverse distance weighting, and k equal to 14, while using bands 4, 3 and 2. With these parameters, the cross-validation error is 0.009174, and the overall accuracy is around 86%. The results are compared with those from the Maximum Likelihood algorithm. KNN results are found to be accurate enough for practical applicability of this method for mapping forest areas.
|
30 |
Time-Series Classification: Technique Development and Empirical EvaluationYang, Ching-Ting 31 July 2002 (has links)
Many interesting applications involve decision prediction based on a time-series sequence or a set of time-series sequences, which are referred to as time-series classification problems. Past classification analysis research predominately focused on constructing a classification model from training instances whose attributes are atomic and independent. Direct application of traditional classification analysis techniques to time-series classification problems requires the transformation of time-series data into non-time-series data attributes by applying some statistical operations (e.g., average, sum, etc). However, such statistical transformation often results in information loss. In this thesis, we proposed the Time-Series Classification (TSC) technique, based on the nearest neighbor classification approach. The result of empirical evaluation showed that the proposed time-series classification technique had better performance than the statistical-transformation-based approach.
|
Page generated in 0.0419 seconds