Global ETD Search

391	Data Mining Methods For Malware Detection Siddiqui, Muazzam 01 January 2008 (has links) This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval and classification techniques and borrows a number of ideas from the field. We used a vector space model to represent the programs in our collection. Our data mining framework includes two separate and distinct classes of experiments. The first are the supervised learning experiments that used a dataset, consisting of several thousand malicious and clean program samples to train, validate and test, an array of classifiers. In the second class of experiments, we proposed using sequential association analysis for feature selection and automatic signature extraction. With our experiments, we were able to achieve as high as 98.4% detection rate and as low as 1.9% false positive rate on novel malwares. Data Mining Malware Detection Machine Learning Classification Instruction Sequences Signature Extraction Predictive Modeling Supervised Learning Unsupervised Learning Feature Selection Feature Reduction Categorical Data Analysis
392	Comparision of Machine Learning Algorithms on Identifying Autism Spectrum Disorder Aravapalli, Naga Sai Gayathri, Palegar, Manoj Kumar January 2023 (has links) Background: Autism Spectrum Disorder (ASD) is a complex neurodevelopmen-tal disorder that affects social communication, behavior, and cognitive development.Patients with autism have a variety of difficulties, such as sensory impairments, at-tention issues, learning disabilities, mental health issues like anxiety and depression,as well as motor and learning issues. The World Health Organization (WHO) es-timates that one in 100 children have ASD. Although ASD cannot be completelytreated, early identification of its symptoms might lessen its impact. Early identifi-cation of ASD can significantly improve the outcome of interventions and therapies.So, it is important to identify the disorder early. Machine learning algorithms canhelp in predicting ASD. In this thesis, Support Vector Machine (SVM) and RandomForest (RF) are the algorithms used to predict ASD. Objectives: The main objective of this thesis is to build and train the models usingmachine learning(ML) algorithms with the default parameters and with the hyper-parameter tuning and find out the most accurate model based on the comparison oftwo experiments to predict whether a person is suffering from ASD or not. Methods: Experimentation is the method chosen to answer the research questions.Experimentation helped in finding out the most accurate model to predict ASD. Ex-perimentation is followed by data preparation with splitting of data and by applyingfeature selection to the dataset. After the experimentation followed by two exper-iments, the models were trained to find the performance metrics with the defaultparameters, and the models were trained to find the performance with the hyper-parameter tuning. Based on the comparison, the most accurate model was appliedto predict ASD. Results: In this thesis, we have chosen two algorithms SVM and RF algorithms totrain the models. Upon experimentation and training of the models using algorithmswith hyperparameter tuning. SVM obtained the highest accuracy score and f1 scoresfor test data are 96% and 97% compared to other model RF which helps in predictingASD. Conclusions: The models were trained using two ML algorithms SVM and RF andconducted two experiments, in experiment-1 the models were trained using defaultparameters and obtained accuracy, f1 scores for the test data, and in experiment-2the models were trained using hyper-parameter tuning and obtained the performancemetrics such as accuracy and f1 score for the test data. By comparing the perfor-mance metrics, we came to the conclusion that SVM is the most accurate algorithmfor predicting ASD. Autism Spectrum Disorder(ASD) Classification Data pre-processing Feature selection Machine learning algorithms Random Forest Classifier Support Vector Classifier. Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
393	Graph Cut Based Mesh Segmentation Using Feature Points and Geodesic Distance Liu, L., Sheng, Y., Zhang, G., Ugail, Hassan January 2015 (has links) No / Both prominent feature points and geodesic distance are key factors for mesh segmentation. With these two factors, this paper proposes a graph cut based mesh segmentation method. The mesh is first preprocessed by Laplacian smoothing. According to the Gaussian curvature, candidate feature points are then selected by a predefined threshold. With DBSCAN (Density-Based Spatial Clustering of Application with Noise), the selected candidate points are separated into some clusters, and the points with the maximum curvature in every cluster are regarded as the final feature points. We label these feature points, and regard the faces in the mesh as nodes for graph cut. Our energy function is constructed by utilizing the ratio between the geodesic distance and the Euclidean distance of vertex pairs of the mesh. The final segmentation result is obtained by minimizing the energy function using graph cut. The proposed algorithm is pose-invariant and can robustly segment the mesh into different parts in line with the selected feature points.
394	Two papers on car fleet modeling Habibi, Shiva January 2013 (has links) <p>QC 20130524</p> Car fleet modeling car type choice discrete choice modeling prediction aggregation of alternatives cross-validation feature-selection Transport Systems and Logistics Transportteknik och logistik
395	Predicting Biomarkers/ Candidate Genes involved in iALL, using Rough Sets based Interpretable Machine Learning Model. Pulinkala, Girish January 2023 (has links) Acute lymphoblastic leukemia is a hematological malignancy that gains a proliferative advantage and originates in the bone marrow. One of the more common genetic alterations in ALL is KMT2A-rearrangement which constitutes 80% of the cases of ALL in infants. Patients carrying the KMT2A rearrangement have a poor prognosis and will eventually develop drug resistance. This project aimed to find new therapeutic targets which would help in the development of novel drugs. We designed a model which uses gene expression data, to infer expressions of oncogenes and the genes which could be associated with immune pathways. The data was extracted and transformed by removing the batch effects and identifying the biotypes of these genes for more focused research. Here we utilized exome RNA-seq, hence it was necessary to reduce the high dimensionality of the data. The dimensionality reduction was performed using Monte Carlo Feature Selection. After the feature selection, a list of highly significant genes was obtained. These genes were used in a machine learning model, R.ROSETTA, which produces rule-based results centered on rough sets theory. The rules were visualized using VisuNet, an interactive tool that creates networks from the rules. Among others, we identified levels of expressions of genes such as JAK3, TOX3, and DMRTA1 and their relations to other genes using the machine learning model. These significant genes were also used to do pathway analysis using pathfindR which allowed us to infer the oncogenic pathways. The pathway analysis helped us deduce pathways such as immunodeficiency and other signaling pathways that could be potential drugs Machine Learning Cancer Oncology Acute Lymphoblastic Leukemia Rough sets Pathway analysis Feature selection Bioinformatics Computational Biology. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
396	Proteomics and Machine Learning for Pulmonary Embolism Risk with Protein Markers Awuah, Yaa Amankwah 01 December 2023 (has links) (PDF) This thesis investigates protein markers linked to pulmonary embolism risk using proteomics and statistical methods, employing unsupervised and supervised machine learning techniques. The research analyzes existing datasets, identifies significant features, and observes gender differences through MANOVA. Principal Component Analysis reduces variables from 378 to 59, and Random Forest achieves 70% accuracy. These findings contribute to our understanding of pulmonary embolism and may lead to diagnostic biomarkers. MANOVA reveals significant gender differences, and applying proteomics holds promise for clinical practice and research. Proteomics Dimension Reduction Random Forest Features Extraction Feature Selection MANOVA Lawley-Hotelling’s Test Pillai’s Test Wilk’s Lambda Roy’s Largest Root. Applied Statistics Biostatistics Statistical Models
397	SELECTION OF FEATURES FOR ML BASED COMMANDING OF AUTONOMOUS VEHICLES Sridhar, Sabarish January 2020 (has links) Traffic coordination is an essential challenge in vehicle automation. The challenge is not only about maximizing the revenue/productivity of a fleet of vehicles, but also about avoiding non feasible states such as collisions and low energy levels, which could make the fleet inoperable. The challenge is hard due to the complex nature of the real time traffic and the large state space involved. Reinforcement learning and simulation-based search techniques have been successful in handling complex problem with large state spaces [1] and can be used as potential candidates for traffic coordination. In this degree project, a variant of these techniques known as Dyna-2 [2] is investigated for traffic coordination. A long term memory of past experiences is approximated by a neural network and is used to guide a Temporal Difference (TD) search. Various features are proposed, evaluated and finally a feature representation is chosen to build the neural network model. The Dyna-2 Traffic Coordinator (TC) is investigated for its ability to provide supervision for handling vehicle bunching and charging. Two variants of traffic coordinators, one based on simple rules and another based on TD search are the existing baselines for the performance evaluation. The results indicate that by incorporating learning via a long-term memory, the Dyna-2 TC is robust to vehicle bunching and ensures a good balance in charge levels over time. The performance of the Dyna-2 TC depends on the choice of features used to build the function approximator, a bad feature choice does not provide good generalization and hence results in bad performance. On the other hand, the previous approaches based on rule-based planning and TD search made poor decisions resulting in collisions and low energy states. The search based approach is comparatively better than the rule-based approach, however it is not able to find an optimal solution due to the depth limitations. With the guidance from a long term memory, the search was able to generate a higher return and ensure a good balance in charge levels. / Trafikkoordinering är en grundläggande utmaning för att autonomisera fordon. Utmaningen ligger inte bara i att maximera inkomsten/produktiviteten hos en fordonsflotta utan även i att undvika olämpliga tillstånd, så som krockar och brist på energi vilka skulle kunna göra flottan obrukbar. Utmaningen är svår på grund av den komplexa naturen hos trafik i realtid och det stora tillståndsrummet som innefattas. Förstärkningsinlärning och simulationsbaserade söktekniker har varit framgångsrika metoder för att hantera komplexa problem med stora tillståndsrum [1] och kan ses som en potentiell kandidat för trafikkoordinering. Detta examensarbete undersöker en variant av dessa tekniker, känd som Dyna-2 [2], applicerat på trafikkoordinering. Ett långsiktigt minne av tidigare erfarenheter approximeras med ett neuron nät och används för att vägleda en Temporal Difference (TD) sökning. Olika attribut föreslås, utvärderas och sätts sedan samman till en representation att bygga nätverket kring. Dyna-2 Trafikkoordinator (TC) undersöks för dess färdighet att ge beslutsstöd för hantering av grupperade fordon och laddning. Två varianter av trafikkoordinerare, en baserad på enkla regler och en baserad på TD-sökningen, används som grund för utvärderingen av prestanda. Resultaten indikerar att genom inkludering av inlärning via ett långsiktigt minne så är Dyna-2 TC en robust metod för att hantera grupperade fordon och ger en god balans av laddningsnivå över tid. Prestandan hos Dyna-2 TC beror på valet av de attribut som används för att bygga approximeringsfunktionen, sämre val av attribut generaliserar inte bra vilket då resulterar i dålig prestanda. Å andra sidan, de tidigare tillvägagånssätten baserade på planering genom regler och TD-sökning tog dåliga beslut vilket resulterade i kollisioner och tillstånd med låga laddningsnivåer. Jämfört med att basera på regler så är den sökbaserade metoden bättre, den lyckades dock inte hitta en optimal lösning på grund av begränsningar hos sökdjupet. Med vägvisning från ett långsiktigt minne så sökningen kunde sökningen generera högre avkastning och säkerställa en god balans hos laddningsnivåerna. Autonomous Driving Reinforcement Learning Dyna-2 Architecture Function Approximation Feature Selection Machine Learning. Autonom körning Förstärkningsinlärning Dyna-2 Arkitektur Funktionsapproximering Attributval Maskininlärning. Computer and Information Sciences Data- och informationsvetenskap
398	Analysis of Eye Tracking Data from Parkinson’s Patients using Machine Learning Höglund, Lucas January 2021 (has links) Parkinson’s disease is a brain disorder associated with reduced dopamine levels in the brain, affecting cognition and motor control in the human brain. One of the motor controls that can be affected is eye movements and can therefore be critically affected in patients with Parkinson’s disease. Eye movement can be measured using eye trackers, and this data can be used for analyzing the eye movement characteristics in Parkinson’s disease. The eye movement analysis provides the possibility of diagnostics and can therefore lead to further insights into Parkinson’s disease. In this thesis, feature extraction of clinical relevance in diagnosing Parkinson’s patients from eye movement data is studied. We have used an autoencoder (AE) constructed to learn micro and macro-scaled representation for eye movements and constructed three different models. Learning of the AEs was evaluated using the F1 score, and differences were statistically assessed using the Wilcoxon sign rank test. Extracted features from data based on patients and healthy subjects were visualized using t-SNE. Using the extracted features, we have measured differences in features using cosine and Mahalanobis distances. We have furthermore clustered the features using fuzzy c-means. Qualities of the generated clusters were assessed by F1-score, fuzzy partition coefficient, Dunn’s index and silhouette index. Based on successful tests using a test data set of a previous publication, we believe that the network used in this thesis has learned to represent natural eye movement from subjects allowed to move their eye freely. However, distances, visualizations, clustering all suggest that latent representations from the autoencoder do not provide a good separation of data from patients and healthy subjects. We, therefore, conclude that a micro-macro autoencoder does not suit the purpose of generating a latent representation of saccade movements of the type used in this thesis. / Parkinsons sjukdom är en hjärnsjukdom orsakad av minskade dopaminnivåer i hjärnan, vilket påverkar kognition och motorisk kontroll i människans hjärna. En av de motoriska kontrollerna som kan påverkas är ögonrörelser och kan därför vara kritiskt påverkat hos patienter diagnostiserade med Parkinsons sjukdom. Ögonrörelser kan mätas med hjälp av ögonspårare, som i sin tur kan användas för att analysera ögonrörelsens egenskaper vid Parkinsons sjukdom. Ögonrörelseanalysen ger möjlighet till diagnostik och kan därför leda till ytterligare förståelse för Parkinsons sjukdom. I denna avhandling studeras särdragsextraktion av ögonrörelsedata med en klinisk relevans vid diagnos av Parkinsonpatienter. Vi har använt en autoencoder (AE) konstruerad för att lära sig mikro- och makrosackadrepresentation för ögonrörelser och konstruerat tre olika modeller. Inlärning av AE utvärderades med hjälp av F1-poängen och skillnader bedömdes statistiskt med hjälp av Wilcoxon rank test. Särdragsextraktionen visualiserades med t-SNE och med hjälp av resultatet ifrån särdragsextraktion har vi mätt skillnader med cosinus- och Mahalanobis- avstånd. Vi har dessutom grupperat resultatet ifrån särdragsextraktionen med fuzzy c-means. Kvaliteten hos de genererade klusterna bedömdes med F1- poäng, suddig fördelningskoefficient, Dunns index och silhuettindex.Sammanfattningsvis finner vi att en mikro-makro-autokodare inte passar syftet med att analysera konstgjorda ögonrörelsesdata. Vi tror att nätverket som används i denna avhandling har lärt sig att representera naturlig ögonrörelse ifrån en person som fritt får röra sina ögon. Autoencoder Clustering Eye tracking Feature extraction Feature selection Machine learning Parkinson’s disease. Autokodare Klustring Blickspårning Särdragsextraktion Särdragsidentifikation Maskininlärning Parkinsons sjukdom. Computer Sciences Datavetenskap (datalogi)
399	New Clustering and Feature Selection Procedures with Applications to Gene Microarray Data Xu, Yaomin January 2008 (has links) No description available. Statistics Bioinformatics coherence index data mining feature selection gene expression pathway gene profiling informative gene microarray data profile cluster analysis partitioning regulatory network statistical pattern recognition
400	Adaptive Mixture Estimation and Subsampling PCA Liu, Peng January 2009 (has links) No description available. Statistics large data data mining mixture models Gaussian mixtures parameter estimation adaptive procedure partial EM high-dimensional data large p small n dimension reduction feature selection subsampling

Search results