The large scale usage of the PDF coupled with its versatility has made the format an attractive target for carrying and deploying malware. Traditional antivirus software struggles against new malware and PDF's vast obfuscation options. In the search of better detection systems, machine learning based detectors have been developed. Although their approaches vary, some strictly examine structural features of the document whereas other examine the behavior of embedded code, they generally share high accuracy against the evaluation data they have been tested against. However, structural machine learning based PDF malware detectors have been found to be weak against targeted evasion attempts that may be found in more sophisticated malware. Such evasion attempts typically exploit knowledge of what the detection system associates with 'benign' and 'malicious' to emulate benign features or exploit a bug in the implementation, with the purpose of evading the detector. Since the introduction of such evasion attacks more structural detectors have been developed, without introducing mitigations against such evasion attacks. This thesis aggregates the existing knowledge of evasion strategies and applies them against a reproduction of a recent, not previously evasion tested, detection system and finds that it is susceptible to various evasion techniques. Additionally, the produced detector is experimentally trained with a combination of the standard data and the recently published CIC-Evasive-PDFMal2022 dataset which contains malware samples which display evasive properties. The evasive-trained detector is tested against the same set of evasion attacks. The results of the two detectors are compared, concluding that supplementing the training data with evasive samples results in a more evasion resilient detector. / Flexibiliteten och mångsidigheten hos PDF-filer har gjort dessa till attraktiva attackvektorer, där en användare eller ett system riskerar att utsättas för skadlig kod vid läsning av dessa filer. Som åtgärd har formatsspecifika, vanligtvis maskininlärningsbaserade, detektorer utvecklats. Dessa detektorer ämnar att, givet en PDF-fil, ge ett svar: skadlig eller oskadlig, ofta genom att inspektera strukturella egenskaper hos dokumentet. Strukturella detektorer har påvisats sårbara mot riktade undvikningsattacker som, genom att efterlikna egenskaper hos oskadliga dokument, lyckas smuggla skadliga dokument förbi sådana detektorer. Trots detta har liknande detektorer fortsatt utvecklas, utan att implementera försvar mot sådana attacker. Detta arbete testar en modern strukturell detektor med undvikningsattacker bestående av attackfiler av olika obfuskeringsnivåer och bekräftar att dessa svagheter kvarstår. Dessutom prövas en experimentell försvarsåtgärd i form av att tillsätta typiskt normavvikande PDF-filer (från datasetet CIC-Evasive-PDFMal2022) till träningssteget under konstruktionen av detektorn, för att identifiera hur detta påverkar resistensen mot undvikningsattacker. Detektorvarianterna prövas mot samma attackfiler för att jämföras mot varandra. Resultaten från detta påvisar en ökad resistens i detektorn med tillskottet av avikande träningsdata.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-315067 |
Date | January 2022 |
Creators | Ekholm, Oscar |
Publisher | KTH, Skolan för elektroteknik och datavetenskap (EECS), Stockholm : KTH Royal Institute of Technology |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | TRITA-EECS-EX ; 2022:305 |
Page generated in 0.0025 seconds