Digital forensics investigations become more time consuming as the amount of data to be investigated grows. Secular growth trends between hard drive and memory capacity just exacerbate the problem. Bloom filters are space-efficient, probabilistic data structures that can represent data sets with quantifiable false positive rates that have the potential to alleviate the problem by reducing space requirements. We provide a framework using Bloom filters to allow fine-grained content identification to detect similarity, instead of equality. We also provide a method to compare filters directly and a statistical means of interpreting the results. We developed a tool--md5bloom--that uses Bloom filters for standard queries and direct comparisons. We provide a performance comparison with a commonly used tool, md5deep, and achieved a 50% performance gain that only increases with larger hash sets. We compared filters generated from different versions of KNOPPIX and detected similarities and relationships between the versions.
Identifer | oai:union.ndltd.org:uno.edu/oai:scholarworks.uno.edu:td-2271 |
Date | 15 December 2006 |
Creators | Bourg, Rachel |
Publisher | ScholarWorks@UNO |
Source Sets | University of New Orleans |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | University of New Orleans Theses and Dissertations |
Page generated in 0.0017 seconds