Ransomware is an ever-growing issue that has been affecting individuals and corporations since its inception, leading to losses of the order of billions each year. This research builds upon the existing body of research pertaining to ransomware detection for Windows-based platforms through behavioral analysis using sandboxing techniques and classification using machine learning (ML), considering the various predefined function calls, known as API (Application Programming Interface) calls, made by ransomware and benign samples as classifying features. The primary aim of this research is to study the effect of the frequency of API calls made by ransomware samples spanning across a large number of ransomware families exhibiting varied behavior, and benign samples on the classification accuracy of various ML algorithms. Conducting an experiment based on this, a quantitative analysis of the ML classification algorithms was performed, for the frequency of API calls based input and binary input based on the existence of an API call, resulting in the conclusion that considering the frequency of API calls marginally improves the ransomware recall rate. The secondary research question posed by this research aims to justify the ML classification of ransomware by conducting behavioral analysis of ransomware and goodware in the context of the API calls that had a major effect on the classification of ransomware. This research was able to provide meaningful insights into the runtime behavior of ransomware and goodware, and how such behavior including API calls and their frequencies were in line with the MLbased classification of ransomware. / Master of Science / Ransomware is an ever-growing issue that has been affecting individuals and corporations since its inception, leading to losses of the order of billions each year. It infects a user machine, encrypts user files or locks the user out of their machine, or both, demanding ransom in exchange for decrypting or unlocking user data. Analyzing ransomware either statically or behaviorally is a prerequisite for building detection and countering mechanisms. Behavioral analysis of ransomware is the basis for this research, wherein ransomware is analyzed by executing it on a safe sandboxed environment such as a virtual machine to avoid infecting a real-user machine, and its runtime characteristics are extracted for analysis. Among these characteristics, the various predefined function calls, known as API (Application Programming Interface) calls, made to the system by ransomware will serve as the basis for the classification of ransomware and benign software. After analyzing ransomware samples across various families, and benign samples in a sandboxed environment, and considering API calls as features, the curated dataset was fed to a set of ML algorithms that have the capability to extract useful information from the dataset to take classification decisions without human intervention. The research will consider the importance of the frequency of API calls on the classification accuracy and also state the most important APIs for classification along with their potential use in the context of ransomware and goodware to justify ML classification. Zero-Day detection, which refers to testing the accuracy of trained ML models on unknown ransomware samples and families was also performed.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/115272 |
Date | 31 May 2023 |
Creators | Karanam, Sanjula |
Contributors | Electrical and Computer Engineering, Wang, Haining, Yao, Danfeng, Marchany, Randolph Carlos |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0021 seconds