Suspicious URL Filter based on Logistic Regression with Multi-view Analysis / 可疑連結過濾器基於羅吉斯迴歸與多觀點分析

碩士 / 國立臺灣科技大學 / 資訊工程系 / 100 / The current malicious URLs detecting techniques based on URL analysis are hard to find the malicious URLs infected via the obfuscated techniques (e.g., insertion of benign tokens). In this study, we propose an approach based on multi-view in order to reduce the impact from obfuscated techniques. The URLs are composed with several tokens, and each token has different meaning. The hackers use different obfuscated techniques with token combination on different portions, and these techniques have their own behavior. This mechanism intends to learn the behaviors from different portions of URLs (e.g., authority portions) for identifying the level of suspicion of each portion. With comparing the suspicious level of each parts between each URLs, this system would select the most suspicious URLs. This thesis makes following contributions:
(1) Provide a multi-view mechanism for reducing the effect from obfuscated techniques, (2) Automatic filtering out the suspicious URLs without the need for additional configuration and modification in automatic way, (3) dealing with large scale
and unbalance data with effectiveness, and (4) satisfying the requirements of industry.
In the system evaluation, this thesis uses the real data set from T. Co.. According to the requirements of T. Co.: (1) detection rate should be less than 25%, (2) missing rate should be lower than 25%, and (3) the process with one hour data should be end in i a hour. The experimental results show that our approach is effective, and is with the ability to find more malicious URLs and satisfy the requirements given by practical
environment as well as T. Co..

Identiferoai:union.ndltd.org:TW/100NTUS5392041
Date January 2012
CreatorsKe-wei Su, 蘇克維
ContributorsHan-ming Lee, 李漢銘
Source SetsNational Digital Library of Theses and Dissertations in Taiwan
Languageen_US
Detected LanguageEnglish
Type學位論文 ; thesis
Format45

Page generated in 0.0158 seconds